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Preface 



Since 1984, the Artificial Intelligence: Methodology, Systems, and Applica- 
tions (AIMSA) conference series has provided a biennial forum for the presen- 
tation of artificial intelligence research and development. The conference covers 
the full range of topics in AI and related disciplines and provides an ideal fo- 
rum for international scientific exchange between central/eastern Europe and the 
rest of the world. The AIMSA conferences were previously chaired by Wolfgang 
Bibel, Tim O’Shea, Philippe Jorrand, Ben du Boulay, Allan Ramsay and Fausto 
Giunchiglia. AIMSA 2000 is sponsored by ECCAI, the European Coordinating 
Committee for Artificial Intelligence. 

The AIMSA 2000 call for papers suggested that authors focus on Web pro- 
cessing: knowledge construction from the Web, agents and communication lan- 
guages, Web-based documents and interfaces, distance learning and electronic 
commerce. The importance of the conference for AI, and computer science re- 
search in general, was reflected by the presence of 60 qualified papers, out of 
which 34 were selected for publication by the Program Committee. Each paper 
was submitted to at least two independent referees and, in order to be accepted, 
had to be accepted by at least two of them. The high percentage of the accepted 
papers reflects the autonomy of the Program Committee members, who judged 
that more than 50% of the contributions were worth publishing in the Proceed- 
ings. Most rejected papers were not considered of a low scientific quality, but the 
Program Committee members judged that the required revisions on the texts 
could not be completed in the time required to allow editing of the paper before 
the final presentation. 

This LNAI volume includes the selected papers, sorted into topics respecting 
the sequence of presentations at the conference as much as possible. The reader 
can evaluate if and how the suggestions from the Committee concerning the 
focus were influential for the Authors. 

AIMSA 2000 invited two internationally recognized scientists: Enrico Motta 
from the Knowledge Media Laboratory, The Open University, Milton Keynes, 
UK, for his work on ontologies and knowledge reuse, and Christian Queinnec 
from the Laboratoire d’Informatique de Paris VI (LIP6), France for his contri- 
butions to Web processing by reinterpreting functional language primitives in 
terms of navigation on the Web. 

The Program and the Organizing Committees were very honored and satis- 
fied with the success of the conference. AIMSA 2000 achieved the important goal 
of supporting advanced research activities, collaborations and joint endeavors 
between current and future leading scientists in artificial intelligence. In particu- 
lar, AIMSA 2000 contributed to the peaceful and productive east-west European 
unification process, connecting countries from the North and the South of the 
world as well, around the common interest of collaborative scientific discovery 
and cultural development. 




VI 



Preface 



We are pleased to thank the authors for their significant efforts, both in im- 
proving the content and the form of their papers. We are grateful to the Program 
Committee members: all have been very collaborative and have taken great care 
in their evaluations. Authors and Program Committee members, as well as an 
extremely active Organizing Committee, have made it possible to believe that 
competence and skills in AI research are still considered very attractive activities 
in the year 2000, even when the explosive growth of business interests around 
information and telecommunication applications worldwide seems to call irre- 
sistibly our students and young colleagues directly from their early academic 
experience into business. 
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Abstract. This paper has two main objectives. One is to show that 
the dynamic knowledge representation paradigm introduced in [ALP^OO] 
and the associated language LUPS, defined in [APPP99], constitute nat- 
ural, powerful and expressive tools for representing dynamically changing 
knowledge. We do so by demonstrating the applicability of the dynamic 
knowledge representation paradigm and the language LUPS to several 
broad knowledge representation domains, for each of which we provide 
an illustrative example. 

Our second objective is to extend our approach to allow proper handling 
of conflicting updates. So far, our research on knowledge updates was 
restricted to a two-valued semantics, which, in the presence of conflict- 
ing updates, leads to an inconsistent update, even though the updated 
knowledge base does not necessarily contain any truly contradictory in- 
formation. By extending our approach to the three-valued semantics we 
gain the added expressiveness allowing us to express undefined or non- 
committal updates. 

Keywords: Updates of Knowledge Bases, Dynamic Knowledge Rep- 
resentation, Generalized Logic Programs, Theory of Actions. 



1 Introduction 

One of the fundamental issues in artificial intelligence is the problem of knowl- 
edge representation. Intelligent machines must be provided with a precise defi- 

* This work was partially supported by PRAXIS XXI project MENTAL, and a NATO 
scholarship while L. M. Pereira was on leave at the Department of Computer Science, 
University of California, Riverside. We thank Joao Leite for helpful discussions. 
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nition of the knowledge that they possess, in a manner, which is independent of 
procedural considerations, context-free, aird easy to mairipulate, exchairge aird 
reasoir about. 

Airy comprehensive approach to knowledge representation has to take into 
account the inherently dynamic nature of knowledge. As new information is ac- 
quired, new pieces of knowledge need to be dynamically added to or removed 
from the knowledge base. Such knowledge updates often not only significantly 
modify but outright contradict the information stored in the original knowledge 
base. We must therefore be able to dynamically update the contents of a knowl- 
edge base KB and generate a new, updated knowledge base KB* that should 
possess a precise meaning and be efficiently computable. 

1.1 Dynamic Knowledge Representation 

In [ALP+00] we proposed a comprehensive solution to the problem of knowledge 
base updates. Given the original knowledge base KB, and a set of update rules 
represented by the updating knowledge base KB', we defined a new updated 
knowledge base KB* = KB(BKB' that constitutes the update of the knowl- 
edge base KB by the knowledge base KB'. In order to make the meaning of 
the updated knowledge base KB® KB' declaratively clear and easily verifiable, 
we provided a complete semantic characterization of the updated knowledge 
base KB 0 KB' . It is defined by means of a simple, linear-time transforma- 
tion of knowledge bases KB and KB' into a normal logic program written in 
a meta-language. As a result, not only the update transformatioir can be ac- 
complished very efficieirtly, but also query answering iir KB®KB' is reduced to 
query answering about normal logic programs. The implementation is available 
at: http : //centria. di . f ct .uni ,pt/~jj a/updates/. 

Forthwith, we exteirded the notion of a single knowledge base update to 
updates of sequences of knowledge bases, defining dynamic knowledge base up- 
dates. The idea of dynamic updates is very simple and yet quite fundamental. 
Suppose we are given a set of knowledge bases KBg. Each kirowledge base KBg 
constitutes a knowledge update that occurs at some state s. Different states s 
may represent different time periods or different sets of priorities or perhaps 
even different viewpoints. The individual knowledge bases KB^ may therefore 
coirtain mutually coirtradictory as well as overlappiirg information. The role of 
the dynamic update KB* of all the knowledge bases {KBg : s G S}, deiroted by 
^ {KBs : s G S}, is to use the mutual relatioirships existiirg between differeirt 
knowledge bases (as specified by the ordering relation oir s G S) to precisely 
determine the declarative as well as the procedural semantics of the combined 
knowledge base, composed of all the knowledge bases {KBs ■ s G S'}. 

Consequently, the notion of a dynamic program update allows us to represent 
dynamically changing knowledge and thus introduces the important paradigm 
of dynamic knowledge representation. 



Dynamic Knowledge Representation and Its Applications 



3 



1.2 Language for Dynamic Representation of Knowledge 

Knowledge evolves from one knowledge state to another as a result of knowledge 
updates. Without loss of generality we can assume that the initial, default knowl- 
edge state, KSo, is empty Given the current knowledge state KS, its successor 
knowledge state KS' = KS[KB] is generated as a result of the occurrence of 
a non-empty set of simultaneous (parallel) updates, represented by the updating 
knowledge base KB. Consecutive knowledge states KSn can be therefore repre- 
sented as KSo[KBi][KB 2 ]...[KBn], where KSq is the default state and KBfs 
represent consecutive updating knowledge bases. Using the previously introduced 
notation, the n-th knowledge state KSn is denoted by KBi 0 KB 2 © ... © KBn. 

Dynamic knowledge updates, as described above, did not provide any lan- 
guage for specifying (or programming) changes of knowledge states. Accordingly, 
in [APPP99] we introduced a fully declarative, high-level language for knowl- 
edge updates called LUPS {“Language of UPdateS”) that describes transitions 
between consecutive knowledge states KSn. It consists of update commands, 
which specify what updates should be applied to any given knowledge state 
KSn in order to obtain the next knowledge state KSn-i-i. In this way, update 
commands allow us to implicitly determine the updating knowledge base KBn+i- 
The language LUPS can therefore be viewed as a language for dynamic knowl- 
edge representation. Below we provide a brief description of LUPS that does 
not include all of the available update commands and omits some details. The 
reader is referred to [APPP99] for a detailed description. 

The simplest update command consists of adding a rule to the current knowl- 
edge state and has the form: assert (L <— Li, . . . , Lk). For example, when a law 
stating that abortion is illegal is adopted, the knowledge state might be updated 
via the command: assert {illegal <— abortion). 

In general, the addition of a rule to a knowledge state may depend upon 
some preconditions being true in the current state. To allow for that, the assert 
command in LUPS has a more general form: 

assert {L ^ Li, . . . , Lk) when {Lk+i,. . . ,Lm) (1) 

The meaning of this assert command is that if the preconditions Lfe+i, . . . , Lm 
are true in the current knowledge state, then the rule L L\, . . . , Lk should hold 
true in the successor knowledge state. Normally, the so added rules are inertial, 
i.e., they remain in force from then on by inertia, until possibly defeated by some 
future update or until retracted. 

However, in some cases the persistence of rules by inertia should not be 
assumed. Take, for instance, the simple fact alarm jring . This is likely to be 
a one-time event that should not persist by inertia after the successor state. 
Accordingly, the assert command allows for the keyword event, indicating that 
the added rule is non-inertial. Assert commands thus have the form (1) or^: 

assert event {L <— L \, . . . , Lk) when (Lfc+i, . . . , Lm) (2) 

^ And thus in KSq all predicates are false by default. 

^ In both cases, if the precondition is empty we just skip the whole when subclause. 
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Update commands themselves (rather than the rules they assert) may either be 
one-time, non-persistent update commands or they may remain in force until 
cancelled. In order to specify such persistent update commands (which we call 
update laws) we introduce the syntax: 

always [event] (L ^ Li, . . . , Lk) when (Lk+i, ■ ■ ■ , Lm) (3) 

To cancel persistent update commands, we use: 

cancel {L ^ Li, . . . , Lk) when (Tfe+i, . . . , Lm) (4) 

To deal with rule deletion, we employ the retraction update command: 

retract (L ^ Li, . . . ,Lk) when (Tfc+i, . . . ,im) (5) 

meaning that, subject to precondition Lk+i, ■ ■ ■ , Lm, the rule L <— Li, . . . , is 
retracted. Note that cancellation of a persistent update command is very different 
from retraction of a rule. Cancelling a persistent update means that the given 
update command will no longer continue to be applied, but it does not remove 
any inertial effects of the rules possibly asserted by its previous application(s). 

2 Application Domains 

In this section we discuss and illustrate by examples the applicability of the 
dynamic knowledge representation paradigm and the language LUPS to several 
broad knowledge representation domains. 

2.1 Reasoning about Actions 

An exceptionally successful effort has been made lately in the area of reasoning 
about actions. Beginning with the seminal paper by Gelfond and Lifschitz [GL93], 
introducing a declarative language for talking about effects of actions (action 
language A through the more recent paper of Giunchiglia and Lifschitz [GL98b] 
setting forth an enhanced version of the language (the so called language C), a 
number of very interesting results have been obtained by several researchers 
significantly moving forward our understanding of actions, causality and effects 
of actions (see the survey paper [GL98a] for more details on action languages). 

The theory of actions is very closely related to knowledge updates. An action 
taking place at a specific moment of time may cause an effect in the form of 
a change of the status of some fluent. The effect can therefore be viewed as a 
simple (atomic) knowledge update triggered by a given action. Similarly, a set 
of parallel actions can be viewed as triggering (causing) parallel atomic updates. 
The following suitcase example illustrates how LUPS can be used to handle 
parallel updates. 

Example 1 (Suitcase). There is a suitcase with two latches which opens when- 
ever both latches are up, and there is an action of toggling applicable to each 
latch [Liii95]. This situation is represented by the three persistent rules: 
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always {open <— up{ll),up{l 2 )) 

always {up{L)) when {not up{L),toggle{L)) 

always {not up{L)) when {up{L),toggle{L)) 

In the initial situation 11 is down, 12 is up, and the suitcase is closed: 

KSi = {assert {not up{ll)), assert {up{l 2 )), assert {not open)} 

Suppose there are now two simultaneous toggling actions: 

KS 2 = {assert event {toggle{ll)) , assert event {toggle{l 2 ))} 

and afterwards another 12 toggling action: KS 3 = {assert event {toggle{l 2 )){ . 
In the knowledge state 2 we’ll have up{ll),not up{l2) and the suitcase is not 
open. Only after KS^ will latch /2 be up and the suitcase open. 

However, there are also major differences between dynamic updates of knowl- 
edge and theories of actions. While in our approach we want to be able to update 
one knowledge base by an arbitrary set of rules that constitutes the updating 
knowledge base, action languages deal only with updates of propositional knowl- 
edge states. At the semantic level, however, the situation is not so simple. The 
main motivation behind the introduction of the language C was to be able to 
express the notion of causality. This is a very different motivation from the mo- 
tivation that we used when defining the semantics of updated knowledge bases. 

In spite of these differences, the strong similarities between the two ap- 
proaches clearly justify a serious effort to investigate the exact nature of the 
close relationship between the two research areas and between the respective 
families of languages, their syntax and semantics. 

2.2 Legal Reasoning 

Robert Kowalski and his collaborators did a truly outstanding research work on 
using logic programming as a language for legal reasoning (see e.g. [Kow92]). 
However logic programming itself lacks any mechanism for expressing dynamic 
changes in the law due to revisions of the law or due to new legislation. Dynamic 
knowledge representation allows us to handle such changes in a very natural way 
by augmenting the knowledge base only with the newly added or revised data, 
and automatically obtaining the updated information as a result. We illustrate 
this capability of LUPS on the following simple example. Another, slightly more 
ellaborate example, was given in [APPP99]. 

Example 2 (Conscientious objector). Consider the situation where someone is 
conscripted if he is draftable and healthy. Moreover a person is draftable when 
he attains a specific age. However, after some time, the law changes and a person 
is no longer conscripted if he is indeed a conscientious objector. 

KS\ : always {draftable{X)) when {of -age{X)) 

assert {conscripted{X) <— draftable{X), healthy {X)) 

KS 2 : assert {healthy{a)) . assert {healthy{b)). assert {of_age{b)). 

assert {consc-objector{a)). assert {consc-objector{b)) 

KS 3 : assert (not conscripted{X) <— consc-obj ector{X)) 

KS 4 : assert (of-age{a)) 
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In state 3, b is subject to conscription but after the assertion his situation 
changes. On the other hand, a is never conscripted. 

In addition to providing automatic updating, dynamic knowledge represen- 
tation allows us to keep the entire history of past changes and to query the 
knowledge base at any given time in the past. The ability to keep track of all the 
past changes in the law is a feature of crucial importance in the domain of law. 
We expect, therefore, that by using LUPS as a language for representation and 
reasoning about legal knowledge we may be able to significantly improve upon 
the work based on standard logic programming. 

2.3 Software Specifications 

One of the most important problems in software engineering is the problem of 
choosing a suitable software specification language. It has been argued in several 
papers (see e.g. [L097,EDD93,FD93]) that the language of logic programming is 
a good potential candidate for the language of software specifications. However 
logic programming lacks simple and natural ways of expressing conditions that 
change dynamically and the ability to handle inconsistencies stemming from 
specification revisions. Another problem is called elaboration tolerance and re- 
quires that small modifications of informal specifications result in localized and 
simple modifications of their formal counterparts. Dynamic knowledge represen- 
tation based on generalized logic programs extends logic programming exactly 
with these two missing dynamic update features. Moreover, small informal spec- 
ification revisions require equally small modifications of the formal specification, 
while all the remaining information is preserved by inertia. The following hanking 
example illustrates the above claims. 

Example 3 (Banking transactions). Consider a software specification for per- 
forming banking transactions. Account balances are modelled by the predicate 
balance{AccountNo, Balance). Predicates deposit{AccountNo, Amount) and 
withdrawal{AccountNo, Amount) represent the actions of depositing and with- 
drawing money into and out of an account, respectively. A withdrawal can only 
be accomplished if the account has a sufficient balance. This simplified descrip- 
tion can easily be modelled in LUPS by KSi: 

always (balance) Ac, OB + Up)) when (updateBal(Ac,Up),balance(Ac,OB)) 
always (not balance(Ac,OB)) when (updateBal(Ac, N B), balance) Ac, OB)) 
assert )updateBal)Ac,—X) <— withdrawal)Ac, X),balance)Ac,OB),OB > X) 
assert (updateBal)Ac, X) <— deposit)Ac, X)) 

The first two rules state how to update the balance of an account, given 
any event of updateBal. Deposits and withdrawals are then effected, causing 
updateBal. 

An initial situation can be imposed via assert commands. Deposits and with- 
drawals can be stipulated by asserting events of deposit /2 and withdrawal /2. 
E.g.: 

KS 2 ■ {assert (balance)!, 0)), assert )balance)2, 50))} 

KSs : {assert event (deposit)!, AO)), assert event )withdrawal)2, 10))} 
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causes the balance of both accounts 1 and 2 to be 40, after state 3. 

Now consider the following sequence of informal specification revisions. De- 
posits under 50 are no longer allowed; VIP accounts may have a negative balance 
up to the limit specified for the account; account #1 is a VIP account with the 
overdraft limit of 200; deposits under 50 are allowed for accounts with negative 
balances. These can in turn be modelled by the sequence: 

KSi : assert {not updateBal{Ac, X) <— deposit {Ac, X), X < 50) 

KSs : assert {updateBal{Ac, —X) <— vip{Ac,L),withdrawal{Ac,X), 

balance{Ac, B), B + L > X) 

KSe : assert (uip(l,200)) 

KSr : assert {updateBal{Ac,X) ^ deposit{Ac, X),balance{Ac, B), B < 0) 

This shows dynamic knowledge representation constitutes a powerful tool 
for software specifications that will prove helpful in the difficult task of building 
reliable and provably correct software. 

3 Representation of Conflicting Knowledge 

Let us consider the following contradictory advice example, which models a sit- 
uation where an agent receives conflicting advice from two reliable authorities. 
Since the agent’s expected behaviour is not to do anything that he was advised 
not to do by a reliable authority, the agent should neither perform the given 
action nor refuse to do it. Instead, the agent should remain non-committal and 
the outcome of his decision process should therefore be undefined. 

Example 4 (Conflicting Advice). An agent receives advice from two reliable 
sources: his father and his mother. The agent’s expected behaviour is to per- 
form an action recommended by a reliable authority unless it is in conflict with 
the advice received from another authority. 

always {do{A) <— fatherjidvises{A),not dont{A)) 
always {dont{A) *— mother _advises{no A), not do{A)) 
always (T <— do{A), mother _advises{no A)) 
always (T ^ dont{A), father mdvises{A)) 

Suppose the father advises buying stocks but the mother advises not to do so: 
KSi = {assert event {father jidvises{buy)), assert event (mother jidvises{nobuy))} 



In this situation, the agent is unable to choose either do{buy) or dont{buy) and, 
as a result, does not perform any action whatsoever. 

The above illustrates the need for a 3-valued semantics for knowledge up- 
dates. So far, in our research on knowledge updates, we were exclusively using 
a 2- valued semantics, namely, the stable semantics [GL88], suitably extended to 
the class of generalized logic programs Under the 2- valued semantics, the 

^ The class of generalized logic programs can be viewed as a special case of a yet 
broader class of programs introduced earlier in [LW92]. 
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above situation results in an inconsistent update, because of integrity constraint 
violation. In this section we extend our approach to the (3-valued) well-founded 
semantics of generalized logic programs. This will enable us to model knowledge 
updates with non-committal or undefined outcome, as required. 

Recall that both the dynamic updates and the LUPS semantics can be defined 
by means of linear-time transformations into generalized logic programs. The 
transformation encodes both the declarative meaning of the update and the 
inertia rules. To generalize both semantics to a 3-valued setting, one needs to 
extend the semantics of normal logic programs with default negation in the 
heads to a 3- valued setting. The resulting update program semantics is based on 
the well-founded semantics instead of the stable models. Accordingly, below we 
generalize the well-founded semantics of normal logic programs to generalized 
logic programs. 

We start by presenting the definition of the stable model semantics of gen- 
eralized logic programs^. 

Definition 1 (Generalized Logic Program). A generalized logic program P 
in the language L is a set of rules of the form L ^ L\, . . . , where L and Li 
are literals. A literal is either an atom A or its default negation not A. Literals 
of the form not A are called default literals. If none of the literals appearing in 
heads of rules of P are default literals, then the logic program P is normal. 

In order to define the semantics of generalized programs, we start by elimi- 
nating all default literals in the heads of rules. 

Definition 2. Let C he the language obtained from the language C of a general- 
ized logic program P by adding, for each propositional symbol A, the new symbol 
A. P is the normal program obtained from the generalized program P through 
replacing every negative head not A by A. 

The definition of the stable models of generalized programs can now be gotten 
from the stable models of the program P. The idea is quite simple: since P is 
a normal program, its stable models can be identified via the usual definition 
by means of the Gelfond-Lifschitz operator P [GL88]; afterwards, all it remains 
to be done is to interpret the A atoms in the stable models of P as the default 
negation of A. Since atoms of the form A never appear in the body of rules of 
P, this task is trivial: if A is true in a stable model then not A must also be true 
in it (i.e. A cannot belong to the stable model); if A is false in a stable model, 
then no rule in P concludes not A, and so the valuation of A in the stable model 
is independent of the existence of rules for A. 

Definition 3 (Stable models of generalized programs). Let P be a gen- 
eralized logic program, and let I be a stable model of P (i.e. I be such that 
I = P-pI ) such that for no atom A both A and A belong to I. The model M , 
obtained from I by deleting from it all atoms of the form A, is a stable model 

of P- 

The definition below is different from the original one in [ALP"''00], but their equiv- 
alence can easily be shown given the results in [DP96] 
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Now, a naive extension of the well-founded semantics to generalized programs 
would simply consider the fixpoints of the compound operator for the trans- 
formed programs P, and then remove all fixpoints where, for some atom, both A 
and A held. In fact, for normal programs, the least fixed-point of F^ character- 
izes the well-founded semantics. However, this naive definition does not engender 
intuitive results: 

Example 5. Consider the generalized program P = {not a; a <— noth; b <— 
not a}. According to the naive semantics, the well-founded model would be 
{not a}. In this case, since not a is true, one would expect b to be true as well. 

In the definition of stable models for generalized programs, whenever an atom 
A is true in some interpretation I (in the extended language C), and hence by 
definition A ^ I, it is guaranteed that after applying the C-operator once all 
occurrences of not A are removed from rule bodies. In other words, whenever 
A is true, A is assumed false by default in rule bodies. In the well-founded 
semantics, one must ensure that, whenever A belongs to a fixed-point of F^, all 
literals not A in the bodies must be true. In other words, whenever A belongs to 
a fixed-point I of F^, A must not belong to F(I). This is achieved by resorting 
to the semi-normal version of the program: 

Definition 4 (Semi- normal program). The semi-normal version Ps of a 
normal program P is obtained by adding to the body of each rule in P with head 
A (resp. A) the literal not A (resp. not A). 

Definition 5 (Partial stable models of generalized programs). Let I be 

a set of atoms in the language C such that: 

(1) / = Fp{Fp^{I)) and (2) I C />^(/) 

The 3-valued model M = T U not F is a partial stable model of the program 
P, where not {Ai, An} stands for {not Ai, ... ,not An}, and T is obtained 
from I by deleting all atoms of the form A and F is the set of all atoms A that 
do not belong to F-p^(I). 

With this definition there is no need to explicitly discard interpretations 
comprising both A and A for some atom A. These are already filtered by condi- 
tion (2). Indeed, if both A and A belong to I then, because in Ps all rules with 
head A (respectively. A) have not A (respectively, not A) in the body, neither A 
nor A belong to F-p^{I), and thus condition (2) will fail to hold. 

Definition 6 (Well-founded model of generalized programs). The well 
founded model of a generalized program P is the set-inclusion least partial sta- 
ble model of P, and is obtainable by iterating the (compound) operator F-pF-pj 
starting from {}, and constructing M from the so obtained least fixpoint. 

Example 6. The well-founded model of the program in example 5 is {b,nota}. 
In fact, Fp^{{}) = {a,b,a}, Fp{{a,b,a}) = {a}, />,,({a}) = {b,a}, rp({6,a}) = 
{6, a}. Accordingly, its well-founded model is {b,nota}. Note, in the 3rd applica- 
tion of the operator, how the semi-normality of P is instrumental in guaranteeing 
truth of b. 



10 



Jose Jiilio Alferes et al. 



4 Concluding Remarks 

While LUPS constitutes an important step forward towards defining a powerful 
and yet intuitive and fully declarative language for dynamic knowledge represen- 
tation, it is by far not a finished product. There are a number of update features 
that are not yet covered by its current syntax as well as a number of additional 
options that should be made available for the existing commands. Further im- 
provement, extension and application of the LUPS language remains therefore 
one of our near-term objectives. 
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Abstract. By the expert information we mean the information given by 
a man-expert of a field of science and technology, or by some intelligent 
program (for example intelligent agents) in solving of some task. We 
assume that in an intelligent distributed system for the same task the 
system sites may generate different solutions, and the problem is for the 
management system to determine a proper one solution for the task. In 
this paper we propose solving above problem by determining consensus 
of given solutions and treat it as the final solution. We present a general 
consensus problem, the postulates for consensus choice and their 
analysis. This analysis shows that the final solution being the consensus 
of given solutions should be the most credible solution in the uncertain 
situation. 



1 Introduction 

Distributed intelligent systems consist of autonomous sites and the autonomous feature 
is the resource of such kind of conflicts that the expert information generated by the 
sites on some matter is inconsistent. By the expert information we mean the 
information given by a man-expert of a field of science and technology, or by some 
intelligent program (for example intelligent agents) in solving of some task. We 
assume that in an intelligent distributed system for the same task the system sites may 
generate different solutions, and the problem is how to determine a proper one 
solution for the task. Generally, this kind of situations is related with preserving of 
data consistency. For non-distributed systems it seems that the problem is solved 
naturally by their integrity constraints, however in distributed systems (even with 
homogeneous databases) the data inconsistency problem is more complicated [3,4,19]. 
Below we present some examples. 

As the first example, let us consider a distributed database system for a bank. 
Assume that a client of the bank takes credits in different branches, and he pays them 
progressively. Thus in these branches different opinions about the credibility of the 

S. A. Cerri and D. Dochev (Eds.): AIMSA 2000, LNAI 1904, pp. 11-20, 2000. 
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client may arise. If, in the future, this client will want to take another credit, then a 
univocal opinion should be useful for the bank to make a decision. The second 
example refers to time indeterminacy [10]. Let us consider a distributed system whose 
sites' tasks are based on monitoring meteorological situations in their regions, and 
forecasting, for example a period of rain. If the regions occupied by these sites overlap 
then it is possible that the sites can give different (even contradictory) forecasts for the 
common towns. Of course these data are still consistent from the point of view of 
database integrity constraints. However, when it is needed to make a forecast for the 
whole country, the management system must create a view of fragments generated by 
the sites, and in the view there may exist inconsistency of the data. 

The above examples show that in intelligent distributed systems there may exist 
conflict situations in which for the same matter different versions of information are 
generated. In such cases the management system must determine such version of the 
information, which should be the proper one. In this paper we propose a tool for 
resolving this kind of conflicts. These conflicts refer to data semantics in distributed 
database systems. The tool proposed here consists of consensus methods for data 
analysis. The version of data, which is a consensus of given versions, should be the 
most credible one. Generally, consensus of given versions of data (some of which may 
be contradictory with each other) should be chosen only if these versions refer to the 
same subject and it is not possible to re-create the proper version on the basis of 
certain and exact information. The intention of the author is to present in this work a 
formal and general problem of consensus choice (section 3). In section 4 the 
postulates for consensus and their analysis are presented. This analysis shows how to 
choose the consensus satisfying fixed properties. A numerical example is also given in 
this work. 



2 Related Works 

Consensus theory has a root in choice theory. A choice from some set A of alternatives 
is based on a relation a called a preference relation. Owing to it the choice function 
may be defined as follows 

C{A)^{xeA:('^eA)((x,y)e a)} 

Many works have dealt with the special case, where the preference relation is 
determined on the basis of a linear order on A. The most popular were the Condorcet 
choice functions. A choice function is called a Condorcet function if [12]: 
xG C(A)<^(\fyeA)(xe C({x,y})) 

In the consensus-based researches, however, it is assumed that the chosen 
alternatives do not have to be included in the set presented for choice, thus C(A) need 
not be a subset of A. On the beginning of this research the authors have dealt only with 
simple structures of the set A (named macrostructure), such as linear or partial order. 
Later with the development of computing techniques the structure of each alternative 
(named microstructure) have been also investigated. Most often the authors assume 
that all the alternatives have the same microstructure. On the basis of the 
microstructure one can determine a macro structure of the set A. Among others. 
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following microstruetures have been investigated: linear orders [1], ordered set 
partitions [6,7], non-ordered set partitions [9], «-trees [9], time intervals [17]. The 
following macrostruetures have been considered: linear orders and distance (or 
similarity) functions. Consensus of the set A is most often determined on the basis of 
its macro structure by some optimality rules. If the macro structure is a distance (or 
similarity) function then the Kemeny's median [1] is very often used to choose the 
consensus. According to Kemeny's rule the consensus should be nearest to the 
elements of the set 

Consensus as the tool for experts' classifications analysis has been investigated in 
works [5,14,15], in which the authors have dealt with such structures that ordered 
partitions, ordered coverings and non-ordered partitions of a set. 

In the field of distributed systems, it seems that consensus is an efficient method for 
restoring inconsistency of replicated data [8] or for solving the conflicts among agents 
[11,16]. For faulty tolerance many works have used consensus methods for solving 
problems. Among others a consensus problem was formulated and solved for 
asynchronous systems where processors can crash and recover [13]. In [21] the 
authors propose a protocol, which enable tolerance faulty of links by determining 
consensus of different possibilities of failure. Solving consensus problem in a mobile 
environment is investigated by work [2]. An anatomy of conflicts is shown in 
work [20], which presents a formal model and the measurements for conflicts. 



3 Structure of Consensus 

3.1 Basis Notions 

We assume that some real word is investigated by sites of a distributed system. This 
real word is represented by a set of objects that may be classified into more than one 
group (for example group of peoples, group of buildings etc). Subject of investigation 
of the sites is a set of features (relations), which can be possessed by the real world 
objects. The relationships between the features should be represented by some logic 
formulas (they can be interpreted likely as the integrity constraints for databases). 

Thus by a structure of consensus we call an extended relation system 
Consensus _Str = (X,F,R,Z) 

where 

- X: finite set of consensus carriers 

- F: finite set of functions 

- R : finite set of relations on carriers 

- Z: set of logic formulas which must be true in the model (X,F,R). 

The idea of the formalism is relied on including all the information about the 
situation, which requires consensus choice and all circumstances needed for this 
process. The aim of building the structure is to include in a system all information 
needed for consensus choice and to enable defining a formal language, which should 
serve to the implementation of the choice. Following example should give more 
explanations of above notions. 
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Example. Let us take a multiagent system funetioning in a distributed environment. 
The agents’ tasks are based on analysis of meteorologieal data and making weather 
foreeast for their regions for next day. If regions oeeupied by several agents overlap 
then the versions of the foreeast for an overlapped sub-region may differ from eaeh 
other. Thus we deal with the eonfliet in determining one version of foreeast for the 
eommon region. Assume that the weather foreeast refers to the degree and oeeurrenee 
time of sueh phenomena as rain, snow, sun, temperature ete. We define above notions 
as follows: 

{ Agent, Region, Degree, Time, Temp) , 

where 

Agent ^ {ai,a 2 ,« 3 ,« 4 }, 

Region = {ri,r 2 ,...,rw}. 

Degree = [0,1], 

Time = set of time intervals represented by time ehronons' identifiers (e.g. [2000- 
09-20:5AM -2000-09-20: 1 1AM] or [SAM -1 1AM] if the day is known). 

Temp = set of intervals of whole numbers representing Celsius degrees, for 
example [3,6] or [-10,-1]. 

The names of earriers are also used as attributes name. 

-F^ {Credibility} 

where Credibility : Agent -^Degree is a fimetion whieh assigns to eaeh agent a degree 
of eredibility, beeause for example in dependenee from the modernity of deviees the 
eredibility degrees of agents may differ from eaeh other. 

- R = {Rain^ , Rain ,Snov/ ,Snow}Sun^ , Sun .Temperature^ , Temperature^ 
where 

Rain^,Rain^ Agent X Region X Time 

Snow^.SnowQ, Agent X Region X Time 

Sun^, Sun c. Agent X Region X Time 

Temperature^, Temperature^oAgent X Region X Temp 

In this example the relations Rain and Temperature are presented as follows: 

Relation Rain^ Relation Rain 



Agent 


Region 


Time 


Q\ 


r\ 


3AM40AM 


a\ 


ri 


7AM40AM 


Q\ 


n 


2PM-6PM 


Cl2 


ri 


5AM-8AM 


Cl2 


ri 


3AM40AM 


Cl2 


ri 


7AM4 1AM 




rt 


4AM-8AM 




ri 


3AM-8AM 




ri, 


2PM-8PM 


Q4 


rs. 


6AM-5PM 


Q4 


r9 


2AM-8AM 


Q4 


rio 


3AM4AM 



Agent 


Region 


Time 


Q\ 


r\ 


3PM-8PM 


Q\ 


ri 


1 1 AM4PM 


Q\ 


ri 


5AM4 1AM 


Cl2 


ri 


3PM-8PM 


Cl2 


r4 


10AM4PM 


Cl2 


ri 


12AM4PM 


Ct3 


re 


10AM42PM 


CI3 


ri 


1PM4PM 


CI3 


r% 


10AM42AM 


Q4 


r% 


6PM41PM 


Q4 


r9 


12AM-5PM 


Q4 


rio 


8AM4PM 
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Relation Temperature^ 



Agent 


Region 


Temp 


Qi 


ri 


3-6 


a\ 


ri 


4-8 


Qi 


rj, 


24 


Cll 


rj, 


2-6 


Cl2 




3-8 


Cll 


r$ 


440 


C13 


re 


24 


C13 


rj 


(-l)-8 


C13 


rs 


(-l)-5 


Q 4 


rs 


24 



Relation Temperature 



Agent 


Region 


Temp 


Q\ 


r\ 


>1 


Q\ 


ri 


>10 


Q\ 


r-i 


>7 


Cll 


r-i 


>8 


Cll 


r 4 


<0 


Cll 


rs 


>7 


Cl?, 


re 


>8 


Cl? 


ri 


>9 


Cl? 


r% 


>8 


U 4 


r% 


>7 



We interpret a tuple (for example <ai,ri,3AM40AM>) of relation Rain"" as 
follows: aeeording to foreeasting of the agent a\ in region rj rain will fall during 3AM 
and 10AM. A tuple <ai,ri,15AM40AM> of relation Rain is interpreted as follows: 
aeeording to foreeasting of agent a^ in region rj rain will not fall during 3PM and 
8PM. It means that in the rest of next day time (viz. between 0AM and 3 AM, 10AM 
and 3PM, 8PM and 12PM), agent a^ does not have any basie to state if rain will fall in 
region r^ or not. The interpretation of other relations is similar. 

Logie formulas from set Z have to present the eonditions whieh should be satisfied 
by relations from set P. In this ease we have following formulas of first order logie 
Z = {Sun(a,r,t,d)=^z>6AM a z<6PM), Rain(a,r,t,d)^Sun(a',r,t,d')} 

The first formula represents a eondition that if in region r at time t is sun then the 
time must be between 6AM and 6PM, this eondition is, of eourse, dependent from 
given month. The seeond formula requires that if in region r at time t there is rain, 
then at the same time may not be sun. 



3.2 Basis of Consensus 

To the foundation of eonsensus ehoiee we take the set P of relations on the earriers of 
the relation system. We firstly define a consensus resource Re as follows: 

Re = {P\P"J^ -. and P‘=P)om(P)\(P^uP)}, 

where Dom(P) is the whole Cartesian produet of the earriers on whieh relations P^ and 
P are defined. For example Dom{Rain) = AgentxRegionxTime. Next we define a 
consensus domain as a pair of two sets: set of relation names and set of eonsensus 
relationships between attributes, for example 
<{Rain^,Rain *,Rain^ , {Region— Time}>, or 
<{Rain^} , [Region— Time}>. 

The first eomponent (a subset of eonsensus resouree) of above tuples is ealled a 
consensus basis, and the seeond eomponent is ealled a consensus subject. We 
interpret these eomponents as follows: the eonsensus basis eonsists of all information 
whieh is needed for eonsensus ehoiee, by a eonsensus subjeet Region— Time we means 
that for a given value of attribute Region there should be only one value for attribute 
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Time. Thus a consensus resource should include all information needed for different 
choices of consensus, a consensus domain presents a basis from what the consensus 
should be chosen and subject to which the consensus refers. 

If consensus basis is of form {P^ then its elements are called: 

- P^\ positive component 

- P^-. uncertain component 

- P '. negative component 

The interpretation of elements of relations P^ and P is given above. Notice that set 
P^ is also a relation on the same carriers as relation P^ and Pj and its elements should 
be interpreted as the uncertainty of the agents. For example if tuple <ai,ri,OAM^AM> 
belongs to relation Pain* then it means that the agent a\ has no foundation to state if it 
will rain in region rj during 0AM and 3AM or not. The concept of interpretation of 
relations P^, P* and P is adopted from the work [20]. Notice also that sets P^, P‘, P are 
dijoint from each other, and P^KJP\jP=Dom{P), thus set {P‘,P‘,P7is a partition of 
Dom{P). 



3.3 States of a Consensus Domain 

Let P = {P^,P‘,Pj[ be a basis of consensus, notice that these relations consist of a 
number of tuples, each of which refers to one property of the real world. Following we 
define some states of a consensus domain. Herein we consider only the cases when the 
basis of consensus is of form {P‘,P‘,P^ . In what follows by we mean the value of 
attribute A in the tuple r belonging to some relation. 

Definition 1. Consensus domain <{P‘,P‘,P7,{H^B}> is inconsistent if there exist 2 
tuples r and r' from set P^ (or P) such that r^=r\, but r^r'g. • 

Definition 2. Consensus domain <{P^,P‘,P7,{H-dS}> is contradictory if there exist 2 
tuples reP^ and r'eP^such that ry=r\ and rsr\r'Bi^0. • 

Some commentary is needed for above definitions. According to definition 1, if in a 
consensus domain there exist 2 tuples which have the same value of attribute A (e.g. 
Region) but different value of attribute B (e.g. Time) then it is in inconsistent state, 
because for the same region there exist 2 different forecasts of rainfall. Definition 2, 
however, treats the inconsistency more sharply, a consensus domain is contradictory if 
for the same region there exist 2 forecasts one of which states that rain will fall at 
some time, and the second states that rain will not rain at the same time. 



4 Consensus Definition and Its Postulates 

Let A and B are attributes, by Dom(P){^ 5 } we denote the set of all tuples belonging to 
Dom(P) but restricted to these attributes. 

Definition 3. By a consensus of domain <{P‘,P‘,P(],{H-dS}> we call a relation 
C(P)cDom(P){^ 5 } which satisfies 1 or more of the following conditions 
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a) For r,r'e C(P) if r^~r'^ then rg=r'g, 

b) One or more of the following general postulates are satisfied 

PI. (Vae^)(3 ceC(P))[(c^=a) =>( fl rg^cg)} 

reP^ 

rA=^a 

P2. (Vae^)(3 ceC(P))[(c.4=a) =>( fl rg(Zcg)\ 

reP 

rA=a 

P3. {\/aeA)(3 ceC(P))[(cA=a)^ fl rg(Xcg)] 

reP- 

rA=a 

P4. {\/aEA){3 cEC{P))[(cA=a)^Cg^ U rg)\ 

reP^ 

FA=a 

P5. (C(P)nC(P>0) =^(C(PuP')=C(P)nC(P')) 

P6. (VaEA \/bEP)[(\i^E C(P)(rA^a =^<Zrg)) => 

(3 P'3 r'G C(PuP'))(r'^=a a bQr'g)l • 

The above postulates require some commentary. The first postulate states that if for 
some value a of attribute A all the voices in their positive components of consensus 
basis qualifies among others the same value b of attribute B, then this qualification 
should have also place in the consensus. Postulates P2 and P3 treat in the similar way 
the negative and uncertain components of consensus basis. According to postulate P4 
if any of voices does not qualify a value b of attribute P to a value a of attribute A then 
there does not exists any such tuple in the consensus. Postulate P5 states that if for 2 
bases their consensuses have not empty common part then this part should be the 
consensus of the sum of these bases. At last, postulate P6 states that for each tuple of 
attributes A and B there should exist a basis whose consensus contains the tuple. 

Each of these postulates treated as a characteristic property of consensus choice 
function would specify in space C of all consensus choice functions a region denoted 
as PI, P2,..., P6 respectively. Notice that all regions PI, P2,..., P6 are independent, it 
means Pi<zPj for all ij^l,...,6 and i^^j. Below we present some properties of these 
postulates. 

Theorem 1. PlnP2n...nP6 0 • 

Theorem 1 states a very important property of consensus postulates, namely there 
should exist at least one consensus function, which satisfies all these postulates. 
Theorem 2. If there is defined a metric 5 between tuples of Dom(P){^_ 5 }, then for 
given domain <{P^ ,P^ ,{A^)> the following consensus function 
C(P) = {cE Dom(P): (c^=a) = min 

yeDom{P) 

satisfies dependency C g (PlnP2n...nP6). • 

The second theorem shows very practical property of the consensus postulates, viz. 
it determines a consensus function which satisfies all these postulates. 

The proofs of these theorem are given in report [18]. 

For above example let P=Rain, A^Region, B=Time and the metric 5 between tuples 
r,r'GDorn(Pa;«){Region,Time} defined as follows: 5(r,r')=(7(''Time,r'Time) where 

^(rTime?r Time) I rjime “rTime I I r Time*"^ Time* I for time intervals (rTime*?rTime ) and 
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(^'Time*,'''Time*)- Next we use the following algorithms for determining consensus of 
time intervals: 

Given'. Set J (with repetitions) of n time intervals, J={ij I ij=(ij*,ij) for 7 = 1 , 2 ,...,«} 

n 

Result'. Consensus i^(ij') satisfying condition = min iy) where 

7=1 

/ is the set of time intervals. 

Procedure: 

BEGIN 

if n=l then set i=i^ and go to END, else 
begin 

{Create sets with repetitions} 

X^: = (i..| j=l,2,...,n) {set of lower chronons} 

X 2 : = {i.*| j =1 , 2 , . . . , n} ; {set of upper chronons} 
end; 

sort sets X^ and X^ in increasing order; 
for X^ do 
begin 

k: = [(nH l)/2] ; {where [x] is the greatest integer 
not reater than x} 

k ' : = [n/2] + l ; 

end; 

set integer i. such that i^,,>i.>i^, ; 
for Xj do 
begin 

k : = [(n H 1) / 2 ] ; k ' = [n / 2] + 1 ; 

end; 

set integer i* such that i^,*>i*>i^*; 
i := (i., i*) ; 

END. 

Thus we should have the following consensus C(Rain) 

Consensus CjRain) 



Region 


Time 


r\ 


3AM40AM 


ri 


7AM40AM 




5AM40AM 


rA 


2AM-8AM 


fs 


7AM4 1AM 


re 


4AM-8AM 


ri 


3AM-8AM 


rs 


6AM4PM 


rg 


2AM-8AM 


rid 


3AM-7AM 



The algorithm for determining consensus for time intervals is given in work [17]. In 
dependence of the structure of the values of attribute B the algorithms for determining 
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values of function C defined in Theorem 2 may be less or more complex. In work [8] 
the authors proposed some algorithms for the case when the values of attribute B are 
binary sequences, which represent data in computer memory. Earlier in [7] the ordered 
covering of a set was investigated and the algorithms were also proposed. It often 
happens that these computation problems are NP-hard, and it is necessary to work out 
heuristics or genetic algorithms. 



5 Conclusion 

From the results of postulates’ analysis it is possible to determine a consensus for 
given conflict situation if the structure of versions are known. Future works should be 
concentrated on investigation when a consensus is good enough for given conflict. In 
other words, from Theorem 2 we can always determine consensus, but the question is 
if this consensus is sensible or given conflict situation is susceptibility to consensus? If 
it is possible, there should be defined a measure for given situation, which informs 
about the sensibility for consensus choice. 
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Abstract. In this paper we describe first the nature of laws and regu- 
lations, which are not-normal, fragmented pieces of text, that can only 
be understood by using some (implicit) model about the world to be 
regulated. Then we describe the process of drafting regulations, in par- 
ticular the need to verify and validate their intended effects, i.e. deontic 
statements. We present an ontology, FOLaw, [13] and a prototype sys- 
tem, TRACS (Traffic Regulation Automation and Comparison System), 
which was created to test new traffic regulations [2]. Even a few runs of 
tests showed major deficiencies in this regulation. An extended version 
of TRACS also enables the generation of paraphrases of regulation, and 
even to some extent, from scratch. The implication of the use of these 
kind of tools are discussed; not only for checking consistency, but also for 
aligning ( “harmonizing” ) regulations of different legal systems (nations) . 



1 What’s in a Regulation? 

“Can you develop a computer program that can check if the new traffic regula- 
tions are consistent and complete?” This apparently innocent question, posed by 
a government agency concerned with traffic safety, SWOV ^ triggered a decade 
of research at our institute, LRI. 

The first and smallest step consisted of a close reading of the draft text 
of the regulation. It does not take much to notice that a legal text is not a 
normal text. It consists of individual statements (articles), many of which have 
a deontic nature. However, we understand these statements in various ways, 
as can be illustrated by the following two articles from this traffic regulation 
(draft-RVV-90) 

Art 3 Vehicles should keep as much as possible to the right 
Art 6 Two bicyclists may ride side by side. 

We understand that Art 6 is an exception to Art 3, but we can only draw this 
conclusion after we have made some spatial model from which we can see that 
the left bicyclist of the two is not keeping fully to the right: the right hand 

^ Stichting Wetenschappelijk Onderzoek Verkeersveiligheid; Foundation for Research 
on Traffic Safety. 
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bicyclist prevents this. If this were the point, normal texts would explain this, 
but regulation texts (“laws”) assume that the reader has already a full under- 
standing, i.e. a generic model, of the domain concerned, i.c. traffic. Moreover, 
in principle these statements only mark what is undesirable behaviour among 
all possible behaviours [1]: that is the reason many of these statements have a 
deontic meaning (‘should’, ‘may’). 

In view of the request of checking consistency and completeness, drafting 
regulations can be compared to programming [7]. In software engineering a pro- 
gram is regarded as correct when it has no errors and is effective, i.e. has been 
verified and validated. The request of SWOV is a verification question. We will 
discuss the validation issues later, but it is already apparent that verification 
is problematic, as laws are riddled with exceptions and exceptions are logical 
inconsistencies. Moreover, the notion of completeness becomes very problematic 
for two reasons. First, the completeness is relative to the model of the world 
(domain) to be regulated and this model is implicit and largely based on com- 
mon sense (see below) . Second, it is hard to design a method to assess whether 
the regulation fully covers the intentions of the legislator, as these intentions are 
often expressed at a global level of desires from which the legal drafter has to 
infer undesirable situations and also has to take into account all kinds of implicit, 
undesirable side effects. Finally, as we will also show, undesirability of situations 
cannot directly be expressed, but only via the use of the famous three deontic 
operators: O(bliged), F(orbidden) and P(ermitted) for very pragmatic reasons. 
As we will show in the next section, these pragmatic reasons are also the cause of 
the many exceptions one finds in regulations. The completeness with respect to 
covering of the intended effects by a regulation is rather a question of validation 
than of verification. 



2 FOLaw: A Functional Ontology for Reasoning with the 
Law 

According to Valente ([12] see also [13]) reasoning in law can be summarized by a 
number of dependent functions, where each function refers to specific categories 
of knowledge with specific properties. Here we can only present the global picture 
of this core ontology (see Fig 1), and we will focus on world knowledge and 
normative knowledge. 

The basic assumptions of this core-ontology are that there is a legal system 
that controls social behaviour in a reactive manner. Of course, the role of law is 
not only to ‘correct’ illegal behaviour, but also to instruct/prevent undesirable 
behaviour. These are crude, simplified assumptions, but for details see [13]. 

A reactive cycle starts with a case, i.e. a real world situation, which is inter- 
preted in order to generate an abstract description of the case in the terms that 
the legal sources use. This abstract case description is called a legal situation, 
and the knowledge used to produce this step is the world knowledge, which forms 
the legal abstract 77iodeZ(LAM). Then, the legal situation is analyzed against the 
normative knowledge to verify whether it violates any norm, thus producing what 
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Fig. 1. FOLaw functional core-ontology distinguishing types of knowledge and 
dependencies 



is called a classified situation (a situation classified as either ‘allowed’ or ‘disal- 
lowed’). In another path, the situation is analyzed using again world knowledge 
(but here particularly its causal component) in order to find out which agents 
in the world (if any) have caused the situation. This information is then used as 
input to the responsibility knowledge which determines which agents (if any) are 
to be held responsible for the situation. The results obtained in these two paths 
(the classified situation and the responsible agents) are then used as inputs for a 
function that defines a possible legal reaction using reactive knowledge. Further, 
outside this cycle, the law may also create an abstract entity (part of the legal 
system) using creative knowledge; this entity is also added to the legal abstract 
model. Finally, meta-legal knowledge refers to all these entities. 

Another way to see the interdependencies shown in Fig 1 is that they pro- 
vide the connections between the (sub)functions from a reasoning point of view. 
That is, legal reasoning can be made modular, with each function correspond- 
ing to a module and the dependencies between the modules being provided by 
the dependencies between the functions. Such dependencies must of course be 
detailed. 
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The main path in Fig 1 can also be seen as the global structure of legal 
arguments: starting from the ’facts of the case’ and going up to sentencing. 
Each category corresponds to a type of argument that has as antecedents the 
inputs and as conclusions the outputs of each function, and as warrants the 
knowledge belonging to that category. For normative knowledge, for instance, 
the conclusion is whether a situation is allowed or disallowed, and the warrants 
are normative knowledge. Moreover, the conclusions in a legal argument are con- 
catenated as shown by the dependencies in the figure; for instance, an argument 
involving world knowledge (say, concluding that a certain person is considered 
a ‘minor’ according to a certain definition) being used as subsidiary for an argu- 
ment involving normative knowledge (say, concluding that a situation in which 
this person was driving a car is disallowed according to a certain norm). Legal 
reasoning can be thus seen as the production and analysis of arguments involving 
one or more of these categories. 

Ours is not the only core ontology for law (see e.g. [16] and [15]. Also, the 
work by McCarty on a “language for legal discourse” can be viewed as a core 
ontology for legal domains [9]. Although the ontologies are structurally very 
different, there is an important overlap of categories. The fact that competing 
ontologies have been proposed has lead to a reflective debate in the AI & Law 
community (e.g. [17]). 



3 Testing Regulations by Situation Generation: TRACS 

In regulations, one will hardly find statements about responsibility: responsi- 
bility, e.g. guilt, liability, is generally established using (common sense) causal 
reasoning [6]. In general, a regulation contains definitions about terms (world 
knowledge) and deontic statements (normative knowledge) . This is also the case 
for the RVV90, the Dutch traffic regulation. In order to test this regulation, we 
constructed a KBS; TRACS (see Fig 2) 

There are two knowledge bases (KB) corresponding to world, respectively 
normative knowledge distinction of FOLaw: one that contains a model of the 
domain of law (World Model), and one that represents the regulations (Le- 
gal Source). The World Model model consists of the objects, agents and 
actions that can be used to generate or interpret situations in the world, i.e. 
cases. The legal world of traffic is very abstract and simplified compared to that 
of the physical/social one. Spatial reasoning is reduced to positions on parts of 
roads; time is of no importance, because the law does not look at the past, but 
only at a situation - e.g. being parked - or at a single action - e.g. crossing. The 
ontology consists of types of “traffic participants” (agents/vehicles), roads and 
parts of roads, and actions, which are represented naturally by a terminological 
classifier (LOOM). There are rules to compose or parse roads and crossings. Be- 
cause of symmetry, and many other constraints the number of possible generic 
situations for this world is limited, but still combinatorial. This is of importance 
because we wanted to test the new traffic code (RVV-90) by generating all pos- 
sible situations. The user in Fig 2 is replaced by a module that is a “situation 
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User 



Fig. 2. Architecture of TRACS for testing regulations [den Haan, 1996] 



generator”, which produces all possible situations; legal or illegal. There is no 
user to feed the system - it runs in batch mode ~ and only the outputs are 
checked. 

The legal reasoning starts when a situation description is fed into the 
Regulation Applier. This component is a simple production system that 
matches the situation description to the representation of the regulation 
in the Legal Source. A legal norms is represented as a generic situation 
description with functions that represent the deontic operators and yield “al- 
lowed/disallowed” as output when applied to a situation description. This for- 
malism as described in [13] enables us to avoid the use of some deontic logic, 
which are known to have no tractable realizations, and need exotic extensions 
to handle ‘paradoxes’. 

A legal norm matches a situation description, if at least one sort of all 
conditions can be unified with the states of the situation description. This 
means that the legal norm is applicable. For instance: 

situation DESCRIPTION: (a’,b',c',d'), 
whereby c is-a-sort-of f, matches 
LEGAL NORM: a,b ^ prohibited(f) 

e.g. (place(my-volvo, crossingl), consists-of(crossingl, roadl,road2), 
is-parked(my-volvo), lights(my-volvo, on)) 
whereby, instance-of(my-volvo, car), sort-of(car, vehicle), and 
sort-of( parking, stopping) 
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LEGAL NORM: (place(car, crossing), consists-of (crossing-x, road-x, road-y) 

— » prohibited (is-stopped(vehicle)) 

(art. 15: “at a crossing of roads it is forbidden to stop a vehicle”) 

More than one norm may be applicable. If the applicable norms have overlap- 
ping parts and conflicting outputs, the norms are inconsistent. However, these 
inconsistencies may be due to ‘exceptions’. If they are exceptions, they should 
be resolvable by the known meta-rules of the law, the most famous being “lex 
specialis derogat legi generali”, i.e. the more specific wins, which is similar to 
conflict resolution mechanisms in production systems. As TRACS was only a 
prototype, the first trials were made when an exhaustive situation generator was 
only specified on paper. However, these trials were already so illuminating that 
no further testing was required. Almost all cases that were specified by hand 
(about 40) lead to unexpected inconsistencies which wee not due to exceptions, 
or to outcomes that were obviously against the intentions of the legislator. 

The results of TRACS were even very surprising in the sense that a number 
of really nasty contradictions - not of the permission/exception type - were 
brought to light in testing the RW-9I, the Dutch traffic code [3]. For instance, 
every situation containing a tram on the tramway was identified as a violation 
of an article that excepted a list of types of vehicles from taking the road (riding 
track). The tram is not in that list. However, adding the tram to this list has 
other nasty side effects. For instance, every situation containing a tram on the 
tramway was identified as a violation of an article that excepted a list of types of 
vehicles from taking driving lanes. The tram is not in that list, so that the tram 
is obliged to take such a lane. As the example shows, inconsistencies may easily 
arise in indirect ways. We also found that often there are no easy, straightforward 
repairs. For instance, adding the tram to this exception list would only mean 
that the tram should use the sidewalk. It turned out that most repairs required 
either ‘structural’ re-drafting, or ad-hoc articles. For instance, adding the article 
that trams should run on the tramway would be such a repair, but the legal 
drafters thought this was too stupid for words. 



4 Generating Regnlations 

Legislative drafting can be semi-automatically supported in at least three dif- 
ferent ways which are complementary. The first one is by providing informa- 
tion services that relate general legal information to the specific tasks of the 
legal drafters. An example is the LEDA system, that follows the officially rec- 
ommended drafting procedures, directives and styles [18]. LEDA offers access to 
relevant legislation in hypertext format, and provides the initial normative struc- 
ture formats. A second way is by providing tools that check the consequences 
of a regulation. ExpertiSZe [II] and TRACS [2] are examples. In this paper we 
also present a third kind of tool and approach which automates the construc- 
tion of paraphrases of regulations on the basis of normative goals ((un)desired 
social behaviour) and a model of the domain (world knowledge) to be regulated. 
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The same normative goals can be expressed in different ways for different pur- 
poses (e.g. legal subjects concerned), like paraphrases of text that express the 
same underlying conceptualization [21]. In this paper we present procedures for 
generating paraphrases of norms. 



4.1 An Outline and Rationale for Generating Normative Rules 

The basic assumptions and steps in the process of generating regulations can be 
summarized as follows. As explained in Section 1 a legal domain refers to some 
social subsystem. A description of how such a world actually works, i.e. what 
behaviours can be exhibited by this world we have called legal abstract model 
(LAM, see Fig. 1). Behaviour at a particular moment is called a situation, con- 
sisting of some configuration (relations) of objects which are in a particular 
state. Situations can be instantaneous (actual), or generic (abstract). Norms are 
generic situations marked as being obliged, forbidden or permitted. ^ 

The LAM should cover all possible situations in the domain. As these cannot 
be enumerated, a LAM is a generic abstraction similar to behavioural models 
in model based reasoning. In the current practice of legal drafting, no explicit 
modeling occurs. It is implicit in the understanding of the problems and the 
debates - political and technical - that are preliminary to and contingent upon 
the drafting activities proper. Making such a model explicit in a process of 
knowledge acquisition, as automation requires, may have the important side 
effect of clarifying more precisely and coherently what is at stake in a law. It 
may result in well defined terms, which may be an explicit part of a regulation 
as well. The disadvantage is also obvious: it takes a lot of effort - up to one or 
more personyear - which may be relatively large for small legal drafting projects 
(e.g. local repairs). A second step is going from an articulate and parsimonious 
world model into a version that consists of a list of all possible situations: this 
we call situation generation (see below. Sec. 4.2). 

The partioning of a world model into two disjoint sets of illegal (undesired) 
and not illegal (not undesired) is called the Qualification Model (QM) [2]. Be- 
cause in (political) practice normative goals (rather: intentions) are formulated 
in more global and vague terms than legislation requires, the process of turning 
these into the QM specification involves assessment. 

The final step involves the construction of one or more regulation models 
(RM) . An RM is not the same as the text of a regulation. It consists of a structure 
of norms, which can be expressed in various (ways in) natural languages. A norm 
has a simple structure as explained in the previous section, but the inter-norm 
structure may be rather complex, because it contains exceptions, which may be 
exceptions to exceptions etc. This structure is called the exception structure. 

Figure 3 depicts the steps and their dependencies in this process. In the 
following section (4.2 we discuss situation generation and norm generation. 

^ In fact, these deontic operators are pragmatic, i.e. efficient expressions to translate 
the goals of the legislator, which are in principle in terms of legal/illegal. We present 



28 



Joost Breuker et al. 




RM-1 

RM-2 

RM-3 



Fig. 3. Generating regulation models from world knowledge aird normative goals 



4.2 Generating Situations 

A qualification model (QM) is a further qualification of the set of possible situ- 
ations that models a specific domain. It is possible to generate such a set auto- 
matically, from the definitions of terms and by combining the possible states of 
objects. 

First, we need an ontology that provides the definitions of terms [14, 13]. 
Iir legal domains, human ageirts play a pivotal role, so we will distinguish two 
types of eirtities: (humair) agents aird objects. Further, agents and objects can 
be related iir dynamic and iir static ways. The dynamic relations are actions or 
processes, while the terms relatioir will denote all other types of relations. As 
attributes are typical one place, relations, actions and processes have in general 
more arguments. For each type of action or process defined in the ontology we 
can generate in priirciple in a combinatorial way all types of arguments (objects, 
agents) that may fill these roles. Common sense ontologies may be used to obtain 
the semantics of these actions (e.g. Wordnet, CYC). 

For instance, in a simplified combinatorial view an action like give has two 
agents (actor aird recipient) and one object (for the object role). Now, if we 
distinguish in the domain three types of agents, and four types of objects, 3 * 
3 * 4 = 36 possible “giving” -situations are generated. ^ To illustrate what is 
involved, we may assume that situations consist of variables, agents (a), actions 

here a rather narrow view of norms. Norms may contain rights and duties [8, 10], 
but we will not discuss here these complications (see also [12]. 

® This is not really correct. An action or process changes a situation into another 
situation. Therefore, one may assume that the give-action should be decomposed into 
an antecedent situationl in which an agent possesses an object, and a consequent 
Situations in which another agent possesses the object. Both situations are causally 
and intentionally related by the agent who is the actor of the change of possession 
(giving). This conceptualization is ontologically more correct, but does not really 
change the notion of combinatorics involved in situation generation. In this paper 
we treat actions as parts of situations; not as ‘bridges’ between situations. The 
latter view complicates, but does not really change the processes of situation and 
rule generation. It is part of current research. 
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(c), objects (o) and relations (r). Furthermore, there are not only elementary 
situations that consist of one action, but also complex situations in which more 
actions or relations are involved {Ncr per situation): 

— Number of agents: a 

— Number of objects: o 

— Number of actions: c 

— Number of relations: r 

— Average arity of action and relation predicates: n 

— Average number of actions and relations per situation :iVcr 

Each predicate may be an action or a relation (c + r) . Each argument is is 
either an agent (a) or an object (o). Each predicate has arity n, so there are 
(a + o)" possible combinations of the arguments per predicate. This yields the 
total of possible actions and relations, which is the multiple (.) of (c + r) and 
(a + o): 

Totpred = (c+ r).{a + o)” 

Suppose that a situation consists of an average of Ncr action or relation 
predicates, then the number of possible situations is: 

Tot sit = {Totpred)^" = ((c + r) * (a + o)”'))^"’-'' 

Indeed a classical combinatorial explosion. However, this is a worst case anal- 
ysis, because we have only blindly applied the terms and types from the ontology 
and used an almost unconstrained version of the action frame (only the types of 
role fillers are given). 

There are various sources for limiting the number of possible (and meaning- 
ful) situations (see also [1]) 

— Redundancy and tautology 

There is a lot of redundancy when one does not carefully choose basic terms 
in representing knowledge. Many relations may have tautological family 
members. For instance, in the traffic domain there is a lot of symmetry 
and inversion. 

— Constraints 

When the world knowledge contains descriptions of the physical limitations 
in the world, the physically and logically impossible situations are pruned. 
For instance, an object cannot be at different locations at the same time. 
Many of these are of a very general nature, like the roles in actions or phys- 
ical principles, but there are often many domain specific ones as well. For 
instance, in the traffic domain, all actions are viewed as taking place in ‘only’ 
two dimensional space. 

~ Abstraction or grain size level 

As in all modeling, and particularly in AI, abstraction is what keeps the 
world manageable, even if it leads to sometimes incorrect identifications at 
lower levels of grain size. 

Filling the values of the traffic world into the equation above yields: Tot — ((c -|- 
r).{a -f = ((20 -f 20). (15 -f 20)^)^ = 2, 401, 000, 000. 
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~ (Legal) Relevance 

In modeling worlds for legal (normative) control, worlds are only considered 
from the point of enforceable law. For instance, a traffic regulation is aimed 
at enhancing the safety of the participants. 

Once the situations have been generated, they have to be distributed over two 
sets to form the qualification model QM. We mention three possible approaches: 

— Qualify each situation by hand according to the initial normative goals. An 
important source for qualification this way may be precedence law. 

— Formulate a limited number of situations of a high abstraction level, and 
qualify these by hand according to the initial normative goals or intentions. 

— Because legal drafting hardly ever occurs with a completely fresh start, one 
may qualify each situation using the older version of the regulation, using 
automated legal assessment. As in most political and technical debate the 
differences between the old and the new are emphasized, it is not difficult to 
use this method followed by the first one. 

For the relatively small domains we are using as our testbeds, qualification 
by hand is not yet a problem. Neither do we think that in practice one ever will 
have to rely only on this by hand method, but rather on the last one. 

4.3 Ftom Qualification to Regulation 

The input for the norm generation is the QM, which consists of the subset 
S~ of the undesired situations, and of the not imdesired situations. S is 
the normative default chosen, i.e. in law this is in general S~ , reflecting the 
principle that what is not forbidden is permitted in law. ct is a situation. RM is 
the set of generated norms; p is a norm. 

The overall algorithm for the generation of normative norms is defined as 
follows: 



while A 7 ^ 0 

select (cr, S) 
translate(cr, p) 
add(p, RM) 
adjust(p, RM) 



Here we do not further elaborate the select and add procedures as they are 
simple set operations. Below, we discuss the translate and adjust procedures. 

The translation procedure is required, because norms in regulations are in 
general not expressed as situations, as in the QM, but as rules, which means that 
a particular action in a norm is selected as a forbidden, permitted or obliged 
consequent, while the other situation descriptors are viewed as conditions, or 
‘grounds for application’. For instance, 0{{agent — l,age{> 18)) A {agent — 
I, voting)) is expressed as “if one is older than 18 years old, one is obliged to 
vote”. In a formal sense, these expressions are equivalent, in particular because 
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the “if” does not mean a real (material) implication, but rather a conjunction. 
This translation is required for communicative (pragmatic) reasons rather than 
for semantical reasons. Besides producing these expressions, the translation pro- 
cedure also contains the algorithms to select the application grounds. The more 
abstract the application grounds, the simpler the rules become, but also the 
more exceptions the regulation will contain. The adjust algorithm takes care of 
the ‘sign’ of the deontic operator to create exceptions in an efficient manner, and 
feeds-back constraints to the select procedure. The algorithms are described in 
more detail in [2, 4, 5]. Thus far, these have been implemented as extensions to 
TRACS and tried out in experimentally in small, artificial domains. 

5 Conclusions 

We have described TRACS in supporting drafting regulations in two ways: one 
by testing regulations (TRACS proper) and the other way is by generating regu- 
lations (TRACS extended). TRACS is based upon the FOLaw ontology in which 
‘world’ and ‘normative’ knowledge are distinguished. Pivotal for all legal reason- 
ing is the world knowledge which models a particular legal domain, independent 
of its normative assignments. This world model, or LAM (Legal Abstract Model) 
allows the variety of functions of TRACS to support legal drafting. In fact, the 
LAM for a particular legal domain may be reused beyond the particular legal 
system for which it has been constructed. With the the globalization of hu- 
man activities, harmonization and coordination of legalization becomes more 
and more urgent. In a very explicit manner this is the case in and around the 
European Community. The traffic regulations provide a good example, as the 
world of traffic is more or less the same all over the world, but the regulations 
differ enormously, and it is hard to assess by hand/eye/brain in which respects 
they are different. As we have seen in Section 3, it is for human legal drafters 
already difficult to trace the implications of an individual regulation, comparing 
and adjusting two or more regulations for the same domain becomes a factor 
more difficult. In fact, TRACS-test can easily put into a comparative mode, 
once a LAM has been constructed, as its systematic situation generator may be 
the input to more than one regulation knowledge base and TRACS records all 
differences in outcomes [5]. For instance, in this way we have compared (parts 
of) the old with the new Dutch traffic regulation. As we have also experienced, 
repairing and adjusting regulations is neither a task that is self-evident as it may 
bring new design errors. Therefore, we view TRACS-extended rather as a tool 
for adjustment, repair and communication in legal drafting than for full blown 
legal drafting from scratch. 

TRACS has been thus far only an experimental testbed. Its basic phi- 
losophy and architecture is currently applied in an Esprit project, CLIME 
(http://www.bmtech.co.uk/-clime/index.html), aimed at information serving of 
huge regulation bases, based upon a LAM and similar assessment algorithms as 
in TRACS [19], [20]. However, we have identified at least two problem areas that 
may limit our approach. 
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~ Situation generation is a combinatorial process. Although there are many 
sources to reduce complexity, many of which are classical in AI, they may not 
lift completely all barriers to scaled up applications. This is typical for model 
based approaches in AI. It should be noted that the same combinatorial 
problem is also one of the major reasons that human legal drafters - or 
for that matter, society - are not capable of tracing the implications of any 
non-trivial regulation. In fact, drafts of legislation are assessed in practice by 
evaluating a set of typical cases and having interest groups give comments. 
Even a ‘case-based’ input rather than full situation generation may improve 
quality as the test runs of TRACS have shown. 

— Our account of the translate process has been somewhat over-optimistic. If 
we want to translate generic situation descriptions and we have to distribute 
these over application grounds and conclusions, we have to assume probably 
too much common sense knowledge about the world. For instance, TRACS 
may also decide that in the example about voting, the agent at voting should 
become over 18 years old. In other words, it should be able to distinguish 
about actions over which an agent can have control and those processes and 
actions he/she has not. Again the semantics of the actions may help, but 
the basic message is that we will hit here the same bottom as many KBS 
applications: the need to fall back on common sense reasoning about the 
world. Indeed, legal reasoning proceeds in close association with common 
sense reasoning. 
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Abstract. We present in this paper our refleetions about the 
requirements of new arehiteetures and languages for the Web, 
eonfronted with the ones emerging from qualified seientists sueh as Me 
Carthy [1] and Wegner [2]. The eontribution highlights if and how 
these refleetions may be eoneretely realized by means of extensions of 
non standard models and tools that we have already experimented and 
that appeared in previous papers (the STROBE model and Phi 
Caleulus). We eonelude with the preliminary speeifieations of a new 
language for modeling and programming Interaetions, ealled C+C, that 
represents eonstruetively our approaeh, privileging the 
eommunieational aspeets among Autonomous Agents, with respeet to 
the more traditional algorithmie ones. 

Keywords: Agent Communieation Languages, Interaetion Languages, 
Web Languages. 



1 Introduction 

Communieation among intelligent Agents on the Web is one of the popular 
researeh issues at the moment. Enhaneing the expressiveness of Web doeuments is 
another fashionable researeh area. They refleet the view that Computing eonsists of 
aetive elements, i.e. Programs - sueh as Agents [3]- and passive ones, i.e. Data - sueh 
as XML [4] (extensible Markup Language) Doeuments or ACL[5](Agent 
Communieation Language) Messages -. Aetive and passive elements, perhaps 
struetured as Objeets, interaet between them and with Human users in order to 
aetivate Proeesses in the modem eontext where the Computer has beeome a notion 
relative to any singular Agent: anything on the Web that may help in solving 
problems. 

In this totally new seenario most of the efforts are foeussed in extending traditional 
models, languages and systems in order to aeeount for the Web. Even the less 
formalized streams of researeh in Computing - Agents and Markup Languages - 
seem to eonsist essentially of teehnieal extensions of old paradigms. Agent’s 
fundamental researeh approaehes - and researehers in the area - look for an offieial, 
theoretieally supported reeognition of Agents as extension of Aetor’s or Objeet’s 
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paradigms. This seems quite odd, since even formal theories justifying Actors and 
Objects, when and if they have been developed, (see, e.g. [6]) are far from becoming 
a justification for the impressive expansion of usage of Objects in the real Computing 
scenario. Similarly, the emergence of XML-like standards and languages and, in 
parallel, of industrial interests around Agent Communication Languages based on 
Speech Acts and Ontologies stimulate endless debates about what is what, always in 
traditional Computing terms. These focus often on the old need to develop standards 
at the syntactic level, while trying to extend the standards at the semantic one. Both 
trends are justified. However, the first need is often contradictory with the second 
one, dependent on the application and therefore on the viewpoints of real users. 

For instance, several times people ask simple questions such as: is an Agent 
different from a Program [7]? Is an XML document, different from a complex datum 
[8]? Is there a semantics associated to XML documents, or are DTDs just ways to 
express well-formed ness properties of tags and nothing more? 

Wegner [2] has several arguments for supporting the view that there is an essential 
difference between algorithms and interactive systems. We did not really work out 
the same profound reflections, but our intuitions years ago do match well with his 
conclusions [9.10]. We agree with Wegner that computing on the Web has to be 
entirely revisited analyzing the interactive communicational phenomena that have no 
counterpart in traditional models, such as Turing Machines, Lambda Calculus or Von 
Neumann machines. These historical models are essentially founded on the closed 
world assumption. 

Me Carthy [1] did understand perfectly the way to follow when he wrote: “A very 
large fraction of office work involves communicating with people and organizations 
outside the office and whose procedures are not controlled by the same authority. 
Getting the full productivity benefit from computers depends on computerizing this 
communication. This cannot be accomplished by approaches to distributed 
processing which assume that the communication processes are designed as a 
group. " 

In the cited papers [9,10] we wrote: “The paper reflects on the difference between 
knowledge systems and knowledge communication systems. Most real world 
situations are inherently open: there is a need for unforeseen interactions with 
external autonomous agents such as humans or programs. ... omissis... For these 
reasons, the theme of understanding the partner in dialogues becomes central. ... 
omissis. . . The example shows in a simple way that if a computational object models a 
situation that has a history, the course of which can not be foreseen in any possible 
way (such as the history of interactions between the bank account object and the 
requests for withdrawals issued by external agents) that object needs state variables 
to be able to model the history and therefore the "context". We must now ask, when 
is it the case that a situation is characterized by a history the evolution of which 
cannot be foreseen in any possible way? Is a flight control program such a situation? 
Is a chess program such a situation? Is an information system, a diagnostic system, a 
scheduler, an expert system in this situation? The question can be rephrased in terms 
of closed worlds. When is it the case that we are entitled to design models of a 
situation under the closed world assumption ? “ 

We are not sure to be able, in this paper, to convince the reader that the two 
contradictory requirements indicated above - new theoretical foundations for 
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Interactive Systems and practical tools for Web processing - may coexist, and even be 
synergic in a promising joint effort. We present here, just a partial view of the results 
of a preliminary work in progress that testimonies the convergence of ideas emerged 
and documented by the senior authors (JS, SAC) in the course of the last years, 
currently at the ground of the research activities of the junior authors (EC,DM) in the 
future ones. 



2 Rationality, Adequacy and Learning in Collaborative Theory 
Formation: Phi Calculus 

This stream of work conducted mainly by Jean Sallantin and his group [11-16] 
starts from a view of Artificial Intelligence that is non standard, even if it is shared by 
an increasingly important number of scholars. The view is summarized hereafter. 

In Artificial Intelligence, we look for machines able to amplify the capacity for the 
human to solve problems. At the core of these studies, we have two questions: 

1 . how to define the interactions between theory and practice in problem solving; 

2. how to define the interactions between humans and machines or among 
humans through machines during the problem solving process. 

Both core questions are centered around the notion of Interaction, i.e. mutual 
action resulting from dialogues among Human and/or Artificial Agents. More 
specifically, the main question is to study how a dialogue may lead to an agreement 
legally acceptable on actions eventually performed by the participants. 

The application investigated and proposed by Castro et al. [13] does not assume 
the availability of a common “Ontology” between Agents, but instead wishes to look 
at the communicative phenomena emerging in order to build incrementally such a 
shared Ontology, in such a way that the shared Ontology expresses the conditions for 
the agreement. In other words, looks at the rationality of interacting Agents when 
they attempt to converge towards a shared view of a domain, represented by an 
Ontology. 

According to these author’s approach, it is not realistic to expect that Ontologies 
are standards that anyone has to accept and learn in order to solve problems 
collaboratively, a kind of “shared truth”. This would imply eliminating negotiations, 
i.e. assuming to be in a completely known environment where methods of problem 
solving are known , mastered and have been previously applied with success. 

On the contrary, the first step in solving problems collaboratively consists of 
building a shared Ontology with the partners (see also the same ideas expressed in the 
literature on cognition and language [17]). This need for separating the shared 
rationale in question answering - called: the search phase - from the access phase, 
typical of well formulated questions, was also behind the work of [18]: before the 
user may put a query (s)he has to know if and how the system may offer answers in 
the domain of the query. 

While most work on Ontologies, attempts to study the first order logical well 
formed-ness of Ontologies (consistency and/or completeness with respect to 
deductible formulas), Sallantin et al., modestly and effectively, reduce Ontologies to 
hierarchies of Terms, Propositions about the terms and Constraints. Ambitiously, 
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instead, they guide their Agents - human or artifieial - to “learn” how to eliminate 
the paradoxes oceurring during the dialogue engaged for constmeting the Ontology. 

The interaetive, communication process is the object of study for them. The 
control process is just state of the art: trees of symbols, constraint propagation and 
standard machine learning are sufficient even for applications such as the one in [15] 
where negotiation allows the management of documents. 

The communication process, observed from outside, seems quite obvious: it 
consists of clicking on a screen, entering a term, relaxing a constraint, validating an 
example, learning some constraints on the examples, keeping or eliminating or using 
examples for generalizing and the like. But the semantics and pragmatics associated 
to these - apparently simple - actions is much more complex, as it addresses notions 
such as abstraction, generalization, rationality (limited, deliberated, computed), 
adequacy (ontological, heuristic, epistemological), postulate, axiom, theorem, fact, 
hypothesis, lemma, justification, promise, exists, confirmed, the modals will, can, 
may, know, wish, etc. The analysis of how these notions are hidden behind 
interactions among generic Agents, leads to make systems that facilitates their 
explicit use. This is the goal of the research. 

In general, these notions emerge historically mainly from Philosophy and therefore 
have been subjectively described by philosophers. Sallantin et al, by adopting at the 
kernel of their studies these philosophical notions, attempt to rigorously define a 
framework for theory formation by revisiting the process in terms of eliminating or 
circumscribing paradoxes in the collaborative construction of Ontologies by rational 
Agents. If it is acceptable to say that the first step in any collaborative problem 
solving process is the construction of a shared, adequate Ontology, then this 
construction becomes the main objective of formal as well as practical studies on the 
future of Computing and AT 

Their objective - called Phi Calculus (Philosophical Calculus) -, in order to be 
acceptable outside the community where it emerges, requires: 

a. a formal (mathematical) theory, 

b. a methodology for the development of software interactive applications - a 
language for interacting Agents - and 

c. a set of real world applications demonstrating the value added by the approach 
and the language for implementing the consequent solutions. 

In other documents we present the current results of a. [14] and c. [16]. Hereafter 
we concentrate on b. because they have been reached jointly with the previous stream 
of work around the other senior author of this paper (SAC). 



3 A Model and a Language for Generic Dialogues: STROBE 

In a series of papers appeared both at Conferences or in Edited books about 
Artificial Intelligence and Education, Intelligent Tutoring Systems, Functional 
Programming, User Modeling, Agents and Soft Computing [19-22] and also in a 
number of unpublished master’s thesis [23-26], the authors have presented a non 
standard model of Agent communication, called STROBE, and a few excerpts of 
experiments on implementations of the model using different technologies. 
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The novelty of the model may be summarized as in the following. 

1 . The model emerges from observations of the communicative behavior of Agents; 
not from their control behavior as it is frequently the case. Agents are considered 
firstly communicative entities, then control entities. 

2. The model is not a formal - mathematical or logical - model, even if it has been 
used for implementing several prototypes. It is not an extension of lambda 
calculus, even if it uses Scheme as a description language for complex, newly re- 
visited, communication and control phenomena. In spite of the fact that it is not a 
formal model, it is embodied in usable programs expressed by formal languages 
(mainly Scheme and Java). 

3. The model does not pretend to solve all the problems envisaged, but allows to 
reduce the problems to a minimum set of simple linguistic and computational 
concepts. These are, in the essence, the notion of STReam (delayed evaluation) 
modeling the Communication among Agents; that of OBject as a function- 
instance resulting from another function-class in a language where functions are 
first class, modeling the Control of Agents; that of Environment as a first class 
abstract data type in a dynamically typed language, modeling the Memory. The 
important consequence of this view, is that one may dedicate first-class 
Environments to Conversations within Agent pairs, as well as, for each Agent 
pair, to Ontologies. The explicitly labeled tree-structured knowledge repositories 
have been called Cognitive Environments. As a consequence. Cognitive 
Environments may be used for modeling in a straightforward way the supposed 
partner’s beliefs, exactly when they are partially inconsistent with the Agent’s 
belief There have not been further studies on how exactly to cope with 
Cognitive Environments as Agent’s memory models, but we are confident that 
these may inherit algorithms from currently available practices in non- 
homogeneous, federated Databases. 

4. The model accounts poorly for the intrinsic asynchronous behavior of Agents. 
All what is said is that an Agent should have a dynamic scheduler that 
asynchronously processes incoming messages and produces outgoing messages. 
In every figure published, however, there is no mention of the fact that the clock 
- in Agent’s conversations - is not a shared variable. Figures and tables seem to 
respect synchronization, thus they are not reflecting what the text says. 

In the rest of the paper we will present an improvement of the model as the basis 

for our proposed language C-l-C. Improvements concern three aspects: 

a. time will be explicitly considered proprietary for each Agent engaged in the 
transaction, and not necessarily shared by any pair of Agents. Synchronization of 
events is not ensured by the model. Eventually, synchronization is left to the 
application designer (cf the discussion on Time at page 9 of [1]); 

b. there is an intuitive, generic description of Agents in terms of a (functional) 
semantics. We may consider this as a preliminary specification of C-l-C; 

c. there are guidelines for a constructive method for designing the communicative 
level, i.e. the primitive and the composite “Acts” that may be required in order to 
make Phi Calculus an effective methodology, thus in order to develop the 
corresponding applications; 
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d. there are realistie guidelines for exploiting the notion of Stream and 
implementing non strietness in parameter passing (lazy eontrol) and a distributed, 
asynehronous, lazy eommunieation model aeeounting for a view of the Web as 
an extension of the loeal memory. 

In order to reaeh that level of elarity in the description of our approach, let us 
consider first an important external contribution that may be supportive - and 
controversial - with respect to our own choices, that of McCarthy’s Elephant 2000 
paper[l]. 



4 Elephant 2000 

In a paper dated 10 June 1989 (appeared much later on the Web site of the author), 
John McCarthy presents his view for a programming language for the years 2000 [1]. 
He called it Elephant 2000, as - he writes - “an elephant never forgets! 

Hereafter a set of annotated statements from the cited paper. 

Goal of the proposed language: Interaction 

Elephant 2000 is a proposed programming language good for writing and 
verifying programs that interact with people (e.g. transaction processing) or 
interact with programs belonging to other organisations (e.g. electronic data 
interchange). 



It is important to recognize that there is no significant difference in interacting 
with people or with programs. 

Writing and verifying are the two aspects. What “good” means is still an issue. 
Does he mean: expressive enough (for writing, for verifying ...)? Or also: simple to 
use (in writing, in verifying ...)? 

Fundamental components of Interactions: Speech Acts 

Communication inputs and outputs are in an I-O language whose sentences are 
meaningful speech acts identified in the language as questions, answers, offers, 
acceptances, declinations, requests, permissions and promises. 



A first list of Speech Acts is made explicit. The semantics should be formally 
defined. For instance, a question may have the intention to know the answer, when 
the querier is interested to know it and does not know it (see: Information Systems), 
or to test if the partner knows the answer and, in the positive case, if the partner’s 
answer coincides with the querier’s view of what the partner knows (as in Tutoring 
Systems, when the Tutor tests the Student’s knowledge). A careful analysis of needs 
around Speech Acts in context [27] suggests that it would be rather utopistic to have 
“standard” Speech Acts in much the same way as to have “standard” Ontologies. 
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Focus on performance of Speech Acts with respect to program’s behavior 

The correctness of programs is partly defined in terms of proper performance of 
the speech acts. Answers should be truthful and responsive and promises should be 
kept. Sentences of logic expressing these forms of correctness can be generated 
automatically from the program. 



The notion of correctness of a program assumes that a program has to generate a 
process that terminates with the solution of the problem. If the problem changes 
during the process - even if the program remembers how it was previously - , it is 
quite hard to define the notion of correctness. Agents are autonomous, and therefore 
they may change autonomously their goals during conversations. In the case that 
programs represent Agents, the notion of correctness should be substituted with the 
notion of adequateness, i.e. the empirical estimate that the Agent will achieve his 
goals, i.e. reach within - if possible - each conversation, the subjective decision for 
the Agent that its current goal - in the conversation - has been satisfied so that the 
conversation may be considered completed. 

Memory model 

Elephant source programs may not need data structures, because they can refer 
directly to the past. 



A language that does not need data structures does not need either assignement (cf 
the programming language Haskell). Thus it is a purely functional language. Notice 
that McCarthy intended probably “mutable” data structure (thanks to C. Queinnec for 
this interpretation). 

The notion of Stream in STROBE models neatly the evolving components of the 
conversation. Here Me Carthy highlights the need for not forgetting as a consequence 
of assignements and traditional memory models. The reason seems to be that one has 
to access the whole history of exchanges in order to manage properly a 
conversational behavior (as confirmed in [27] ) . 

When attempting to give a synthetic view of Elephant 2000, McCarthy writes: 

^ (t+1) = update (i(t), ^,t) 

where is the state vector representing the program 

^ (t+1) depends on the whole past, not simply on ^(t), and: 

i(t) = input world(t) 

where: world is the state vector representing the world external to the computer 

world (t+1) = worldf (output ^ (t), world, t) 

world(t+l) depends on the whole past, not simply on world(t) 

Me Carthy’ s paper is rich and dense, and we cannot have the ambition neither to 
comment every detail, nor to represent his current view. We simply propose in the 
following how McCarthy’s Elephant 2000 programs may be re -interpreted as 
represented by one of our Agents, emerging from STROBE and Phi Calculus. 
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5 C+C: Steps towards a Specification 

The name C+C comes from the intuition that a new language for Interaction may 
emerge from observing historical reflections from two sources: Cybernetics as 
Communication and Control in animals and machines [28] and Philosophy, with the 
Speech Act Theory pioneered by Austin and Searle and the large contributions, from 
Aristotle on, towards a definition of rationality in human behavior [29] . 

McCarthy self adopts and limits correctly his inspiration from Language 
philosophers, when he writes: Taking the design stance in the concrete way 

needed to allow programs to use speech acts tends to new views on the philosophical 
problems that speech acts pose ... we can incorporate whatever abstract analogs of 
these notions we find useful ... the philosophical investigations have resulted in ideas 
useful for our purposes ... “. Therefore, it seems today healthy and well motivated 
for a computer scientist - not only for a researcher in Artificial Intelligence (see 
McCarthy, page 2: “Elephant 2000 is at the borderline of AI, but the article 
emphasizes the Elephant usages that do not require AE’) to look at philosopher’s 
work and import whatever may be useful for our purposes. Ironically, what seemed 
years ago a theoretical exercise far from any potential applicability, becomes more 
and more the necessary foundation for the most concrete expected evolution of 
societies worldwide: the availability and the use of the Web. 

We envisage C+C to be our proposed language for programming the Web. C+C 
programs are like Elephant 2000 programs, but they are called Agents because they 
behave autonomously with respect to any external observer. All observers of C+C 
Agents are external, except the Agent that builds an Agent: this is an internal 
observer. Let us call Pagent - Proprietary Agent - the Agent that builds an Agent A. 
Pagent is the only Agent may verify logically A’s behaviour, while he may never 
verify logically any other Agent’s behavior, as in the opposite case we would again 
fall in the trap of a single memory or authority over the Agent’s evolution; the one 
typical of distributed computing or blackboard expert systems. 

C+C Agents, being programs, are each sequential and discrete. The clock that 
they have is proprietary, and scans operations occurring within the Agent; therefore 
operations may receive an explicit time tag, that is only recognized by the Agent. A 
C+C Agent - similarly to an Elephant program - operates in a loop that is active from 
its birth to its death. The Agent’s operation is described by a single formula: 

\ (t^ +1) = update (input from world (t^ ^ , t^ ); and output to world (t^ +1) 

with: ^ = pagent name is the time of Operation of Agent ^ 

update = function applied by each agent = worldf of Elephant. 

There is no difference in representation between a program and the world: both are 
Agents and both behave asynchronously by exchanging messages, 
world = set of distributed, concurrent Agents. 

Each Agent may send messages also to Self, thus modelling a kind of “reminder”. 
This feature may be useful, for instance, to be aware of commitments, 
input from world (0) = start message for each Agent. 
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STROBE synthetic (functional) representation of Exchange n was as follows. 



En: in 


((gn in)eQn)=> On MnP 


in+1 <=((fn On)epn) < 


On MnQ 



At time n Agent Q (Right Hand Side) selects Act in from the mailbox, coming 
from Agent P (Left Hand Side), and reacts applying the function gn in Environment 
CQn; the resulting Act On is sent to Agent P that at time (n + 1) applies the function 
fn in environment epn yelding Act in+j that is sent to Q ... 

From the description, one may think that time is a shared variable. The above 
described formulas may at most model synchronous mutual calls of procedures, 
possibly represented as synchronous objects. In the formula, there is also not any 
explicit representation of the input mailbox available to Agents, nor of the potentially 
multiple sending of messages as a result of the evaluation of a single message. The 
representation is poor; eventually it may model a two-agent conversation. As reported 
in [21], however, two-agent conversations may not account for “autonomy”, as one 
may suppose that each Agent communicates synchronously with the partner: if the 
knowledge available to one Agent may be reduced to the one acquired from the other 
Agent we return to a closed-world assumption. The minimum for a truly autonomous 
behaviour of Agents is reached when three autonomous, asynchronous Agents 
communicate. 

In order to depict in a yet simpler and clearer way what our Agent architecture is 
in C-l-C, let us consider the Actor’s simplified architecture available in [30]. 

Actors consult the Mailbox at each cycle. Their scheduling algorithm may vary 
from language to language but it is fixed during the temporal evolution of the Actor’s 
knowledge. They may generate new Actors, replace themselves, send messages to 
other actors. 

From this model, we have produced the Agent’s model by implementing a 
scheduling algorithm that is variable with the evolution of the Agent’s behavior. In 
order to ensure the Agent’s autonomy, it is necessary and sufficient to view Agents as 
Actors with a dynamic scheduling algorithm[31][25]. 

The difference between Actors and Agents consists of the fact that Agents do not 
have a fixed, externally defined scheduling algorithm. Thus, the scheduling algorithm 
of Agents may be represented as a function that selects a message out of the mailbox 
at the moment the Agent operates, using any information in the Environment that is 
useful for an autonomously convenient choice. 

In terms of functional (Scheme) programming, an Agent applies a function 
Scheduler to the input mailbox Inbox in Environment Env at each loop; the message 
selected Msg is passed to the operational Object associated to the Agent, together 
with the Environment dedicated to the partner that has sent Msg. As Objects may be 
functionally modelled as a Dispatch function on Selectors of Messages, from the 
moment that the Agent becomes an Object, all properties of Objects are applicable. 
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For instance, Selectors may be more than one. In our case we may at least envisage 
two Selectors in Msg: one for the Ontology and one for the Method. 

AgentLoop (SchedulerA MailboxA Euva) -> 

;;; Phase 1: <selecting the Partner P> 

(DispatchA Msgp Euvap ) -> 

;;; Phase 2:<selecting Ontology O for the conversation with P> 

(DispatchA Msgpo EnvAPo ) -> 

;;; Phase 3:<selecting Method M for the conversation with P to check OntologyO> 
(MethodM argl arg2 ... argn Euvapo) -> 

;;; Phase 4:<applying Method M to solve a problem within Ontology 0> 

(send OutMessage(s)) 

;;; Phase 5\<sending output messages> and 

(goto AgentLoop); EnvAtransformed into Euva’ 

;;; Phase 6=Phase \<looping> 

Phase 1: <selecting the Partner P> 

The Agent operates selecting one of the messages (Speech Acts a la KQML or 
ACL [5]) in the Mailbox. Criteria are private: the selection function applied has full 
access to the Agent’s Environment. Once the message has been selected, it identifies 
a unique Sender (Partner P). The Partner name and the dedicated Environment 

Euvap ^re passed to the Phase 2. If P us unknown, A starts a transaction dedicated to 
the goal to know the new Partner P (an example of laziness in Communication). 

Phase 2: <selecting Ontology O for the conversation with P> 

Similarly, the explicit Ontology is selected in the Message. Clearly, this Phase 2 
may include a selection of a Language for the Content of the Message, etc. in 
cascade. Any field in the Message - represented as a formalized Speech Act - may be 
known to the Agent or unknown. In the latter case, a transaction is initiated by A in 
order to have sufficient information for continuing to process the Message; the 
process suspended and later resumed in a co-routine or delegation like behavior. 
Transactions initiated by A as a result on insufficient information on P message’s 
components are examples of mixed initiative dialogues, that encapsulate sub- 
exchanges within main exchanges. 

Phase 3: <selectmg Method M for the conversation with P to check Ontology 0> 

The evaluation in this phase includes a verification of potential disagreements 
(paradoxes) between the expected behavior of the partner and the real behavior such 
as expressed in the message. 

Phase 4: <applying Method M to solve a problem within Ontology 0> 

This phase assumes the Agent to have reached the Content of the Message and to 
be willing to process it in the traditional Object Oriented way. Any information 
necessary was obtained previously, so that this phase may be modeled as a traditional 
procedure call, eventually accompanied by send operations. Parameters - as well as 
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Fields in the Message - may be not known by A, e.g. when an argument is a pointer 
to a Web Page - an URL In our model, arguments with unknown value at the 
Agent’s side are admitted as the evaluation model for functional applications is a lazy 
one, i.e. values are evaluated when needed. When an argument’s value is required, if 
it is not available in the Agent’s Environment, it is required at the Partner’s side by 
starting a specific transaction, embedded in the other one (an example of laziness in 
control). 

Phase 5: <sending output messages> 

Sending a Message out means depositing it in an Outbox, where an asynchronous 
handler cares for the physical transmission to receivers. 



5, Discussion 

This view of Agents accounts for several apparently separate problems and issues. 

The computational nature of Agents. Agents are neither Objects nor Actors: they 
are “crazy” or “skilled” Actors according to the behaviour emerging from the 
message selection process. In itself, the process is highly non deterministic: during 
the time needed for dealing with the Agent’s loop, other messages may reach the 
Mailbox, and there is no certainty to process all of them somehow. The Actor model 
is that of an Operating Systems, a server of clients, and its objective is to serve fairly 
the client’s needs by delivering services. The Agent’s model is the contrary: an Agent 
serves his own goals, deciding autonomously if and when to dedicate his time to any 
of the partners. 

Agents as looping schedulers. As much as an Object behaviour is totally dependent 
on its dispatching algorithm, an Agent behaviour depends on its Scheduling 
algorithm. However, the dispatching algorithm of an Object is fixed, while the 
scheduling of an Agent may vary with time, intrinsically as an effect of modification 
in the Environment and extrinsically as an effect of the occurrence of new messages 
on the Mailbox. An Agent is a totally undetermined computational entity: we know 
what it is at the beginning, but we cannot foresee what it becomes as a result of 
interactions. 

Ambiguity in Agent’s behaviour. The scheduling algorithm is both similar and 
different from a non deterministic “amb” evaluator [32] that may be a neat model of 
non deterministic search such as the one realised in the Knowledge Systems 
literature. The similarity is that one and just one message is chosen at each loop. The 
difference is that even if the conversation engaged as a consequence of the message 
selected ends in a failure, selecting other messages does not ensure any 
“backtracking” as in search, be it with or without dependency or truth maintenance. 
The past is never forgotten: the effects of a dialogue cannot be undone [1,21]. Thus, 
assignments to variables must be realised by keeping the history of values, e.g. 
binding names to streams of values in Environments. 

Lazy control and Lazy communication are realised in C+C by admitting non 
strictness of any component of messages. When some information is not known, a 
transaction is started aiming at winning the value corresponding to the expression 
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denoting the information. The aphorism is: when in doubt, do not compute, ask (and 
wait: do something else ...). Someone will resume your suspended evaluation 
sometimes providing you with what you need in order to go on. 

We have described briefly the necessity, not only the convenience, of laziness in 
Web computing in [7]). The arguments analyse Web Documents observing that tags 
may be considered explicit types and subtypes of instances of complex data types 
with a tree structure. From this observation, the link to a dynamically typed language 
is immediate, thus to a memory model such as our Environment, linking non-typed 
names with explicitly typed simple or compound data. Yet, functions (constructors, 
selectors, predicates, ...) proprietary of the ADT relative to Web Documents are not 
necessarily within the Document, nor at the site of the Agent that needs the 
document. In order to access and use Web documents, thus, we need not only to 
access the documents but also to find the ADT definitions that help us in interpreting 
the document. Notice that by definitions we do not simply mean the definition of the 
structure (e.g. the DTD in XML), but also any further information necessary to have a 
semantic model of the document valid for the user - not just for the producer -. 

Messages (Speech Act - like) are as well instances of Abstract Data Types, that 
include a “contenf’ subtype with the specified content of the message. We may 
distinguish elementary messages and compound messages, i.e. messages that include 
other messages. Finding out what messages are elementary and what are compound 
is the most difficult task. We have noticed that programming languages - such as 
Scheme - do include elementary “speech acts” (see [21]) as well as Excel and any 
interactive programming or system language, such as the UNIX shell. A deeper study 
[23] has re -classified KQML Speech Acts as consisting of variations of 3 basic 
classes: tell-like, query-like and answer-like. However, the “effects on the partner 
Agent” considered was at the utmost simple level: essentially reading from or writing 
to Memory. If the considered effects will be enhanced, as it is the case envisaged by 
Phi Calculus [11-16], then it is necessary to adopt a constructive view of complex 
messages - Speech Acts, that covers Agent communication phenomena such as those 
referenced by asserted modalities (can, know, have-to, want, may, hope-wish, 
believe, ...); negated modalities (not-believe etc.); extended performatives (see 
McCarthy’s Elephant 2000 preliminary list) and time. We will not have a logic 
approach, however: we know of many logic approaches that brought to 
unmanageable languages and systems. Rather, we will use a truly conversational 
approach of joint construction of complex Acts from simpler ones; where simple and 
traditional principles of software engineering (such as those used in extensible 
languages) will be used. 

Finally, Agents and Messages may be combined in modelling simple, but 
significant transactions. From the processes generated, more properties of the 
necessary and sufficient Acts will be discovered / constructed in an experimental 
cycle reflecting the theoretical framework of Phi Calculus: rationality, adequacy and 
learning in real situations such as those evaluated by real users. 
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6 Conclusions 

The paper has reported over ongoing research of two groups that have joined their 
efforts for the ambitious task of offering a generic theoretical and experimental 
framework for developing interactive software, such as the one necessary for 
Distance Learning and Electronic Commerce. Focussing on the developments around 
a new Agent Language, called C+C, the paper has reported about and discussed 
extensively a foundational paper about a language proposal by J. McCarthy. Inspired 
by the paper, the previous work of the authors has been reviewed and new directions 
have been presented for the realisation of C+C. 

The essence of the paper consists in specifying the components of C+C as 
consisting of the traditional Control elements - such as those in Objects -, the “new” 
Communication elements - such as Speech Acts in messages - and the Interactive 
supervision ones, such as the interrelation among Communication and Control; into a 
homogeneous architecture that has a support from a number of simple arguments. 

We are convinced that it is not just the final result - the language C+C we wish to 
develop and use - but the constructive, interactive process that counts in achieving 
significant scientific results. Therefore, we adopt the role and behaviour that we 
attribute to our Agents and, having generated the first move towards potential partner 
Agents, in a Conference, we wait confidently their reactions, that will hopefully help 
us in better obtaining the envisaged results. 
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Abstract. We present a sound and complete, tractable inference method 
for reasoning with localized closed world assumptions (LCWA’s) which 
can be used in applications where a reasoning or planning agent can 
not assume complete information about planning or reasoning states. 
This Open World Assumption is generally necessary in most realistic 
robotics applications. The inference procedure subsumes that described 
in Etzioni et al [9], and others. In addition, it provides a great deal 
more expressivity, permitting limited use of negation and disjunction in 
the representation of LCWA’s, while still retaining tractability. The ap- 
proach is based on the use of circumscription and quantifier elimination 
techniques and inference is viewed as querying a deductive database. 
Both the preprocessing of the database using circumscription and quan- 
tifier elimination, and the inference method itself, have polynomial time 
and space complexity. 



1 Introduction 

Traditionally, classical reasoning and planning techniques have been developed 
for environments in which the reasoning agent is assumed to have complete 
information about the world in which it is embedded and the only changes to 
the world are the effects which result from the agent’s invocation of actions. 
Under this assumption, an efficient means of representing negative information 
about the world in each planning or reasoning state is to apply the Closed World 
Assumption (CWA) [1,16]. In this case, information about the world, absent in 
a state, is assumed to be false. 

In many realistic applications, in particular robotics applications, the as- 
sumption of complete information is not feasible and the CWA can not be used. 
For example, an unmanned aerial vehicle flying over a region can not have a com- 
plete model of the region. New objects are continually sensed or encountered and 

* The authors are supported in part by a basic research grant from the Wallenberg 
Foundation, Sweden. 
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agents other than the UAV agent cause change in the region. In applications such 
as this, an Open World Assumption (OWA), where information not known by the 
agent is assumed to be unknown, is the ontologically right choice to make, but 
complicates both the representational and implementational aspects associated 
with inference mechanisms and the use of negative information. 

The CWA and the OWA represent two extremes. Quite often, a reasoning 
agent has information which permits the application of the CWA locally. If the 
UAV agent has a camera sensor, the agent can assume complete information 
about objects in the focus of attention (FOA) of the camera; for example, the 
only cars in the FOA are those identified by the image processing module. 

The research issue then, is to find maximally expressive, but tractable infer- 
ence mechanisms for local closed world reasoning which can be integrated with 
deliberative components, such as planning algorithms, used in applications where 
the OWA applies. An additional issue is to be able to dynamically modify the 
degree of closed-worldness relative to the dynamics of the application at hand. 

We approach the problem as follows. The starting point is the approach to 
LCWA described in [9], where the authors present a sound, but incomplete, 
tractable algorithm for LCWA intended for use in the XII Planner. Briefly, 
their approach works as follows: Assume an actual world w which can be repre- 
sented by a complete logical theory. Since the reasoning agent only has incom- 
plete information about that world, but that information is assumed correct, 
the agent’s knowledge can be represented as a set of possible worlds S, where 
w € S. For reasons of tractability, the approach approximates S by representing 
it as a set of ground literals, M, where negative information about w known 
to the agent is represented explicitly. M can be viewed as the agent’s knowl- 
edge database. Localized closure information is represented in another database, 
£, as a set of formulas restricted to be conjunctions of literals (not necessarily 
grounded). For example, M = {parent. dir(ecai. tex, /ecaiOO), size(kr.tex, 100)}, 
C = {LCW{parent.dir{f,/eca\00))}. Although a reasoning agent could not in- 
fer that it knows about all the files in all directories and their sizes, it can infer 
that it knows about all the files in the directory ecaiOO. In [9], the authors de- 
scribe an algorithm which encodes a sound, but incomplete inference relation, 
M, C \=s 01 , where given M and £, they can determine whether a conjunction of 
positive ground literals, a, is inferable under partial closure of the theory. Since 
both £ and a are restricted to be positive conjunctive formulas, the algorithm 
and its efficiency are based on the use of matching conjunctive queries against 
a conjunctive database. Note that due to the OWA, for a specific query a, the 
algorithm may return true, false, or unknown. In the following, we use QLCW 
to refer to the query language of which a would be an instance. 

We substantially extend the approach of [9] by: 

— providing a semantics for the case where LOW constraints in £ and queries 
in QLCW are expressed by arbitrary first-order formulas. The semantics is 
based on the use of circumscription. The new semantics and the one given 
in [9] agree on the special case where conjunctions of positive literals are 
used in £ and QLCW. 
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~ isolating a more expressive language for LCW constraints in L which sub- 
sumes that used in [9], permits limited use of negation and disjunction, and 
still retains tractability. 

— providing a sound and complete, tractable deduction method for the more 
expressive language. Observe that in [9] completeness is not guaranteed even 
in the case of a language with conjunctions of positive literals only. 

Our approach to the problem is based on the use of circumscription to mini- 
mize formulas in C in the context of the theory C A M. Using quantifier elimina- 
tion techniques, the original circumscribed theory can be reduced to a Ist-order 
or fixpoint formula. Viewing the reduced theory as a database query, inference 
relative to M can be viewed as a query to a database. Restricting the expres- 
sivity of C to what we call semi-Horn formulas, M to a conjunction of positive 
and negative ground literals, and queries in QLCW to semi-Horn formulas, we 
can show that both the theory reduction technique and querying technique re- 
main tractable and safe. Tractability means that the method allows for efficient 
(PTIME) computations. Safety means that no inconsistencies are introduced by 
the method no matter what logical dependencies are used in C. 

Note that by first providing a general framework and semantics for structur- 
ing the problem in a classical setting and then isolating tractable combinations 
of fragments of the languages used in M, £, and QLCW, we provide a method- 
ology for generalization of the technique based on the use of results from the 
knowledge representation and deductive database communities. 

2 Preliminaries 

We deal with an ordinary first-order language with equality, Li, over a fixed 
alphabet A without function constants. By L 2 we denote the second-order lan- 
guage based on an alphabet whose symbols are those of A, together with a 
denumerable set of n-ary predicate variables (for each n > 0). These will be 
denoted by the letters <P and S', possibly with subscripts and/or primes. 

In the sequel, we shall use second-order circumscription. Our definition fol- 
lows [13]. 

Definition 1. Let P be a tuple of distinct predicate constants, 5” be a tuple 
of predicate constants disjoint with P, and let T{P,S) be a finite theory in a 
language Li. The second-order circumscription of P in T(P,S) with variable S, 
written CIRC{T{P, S); P; S), is the sentence (in the language L 2 ) 

T(P,S) A V^[T(^,^) A [^ < P] D [P < m 

where <P and 'f' are tuples of predicate variables similar to P and S, respec- 
tively^, and (p < P (resp. P < ^) stands for ^ ^*(^)] (resp. 

M=i\^x.P,{x ) D ^r{x)]). □ 

^ A tuple of predicate expressions X is said to be similar to a tuple of predicate 
constants Y iS X = {Xi, . . . , Xn), Y = {Yi, . . . ,Yn) and, for all 1 < i < n, Xi 
and Yi are of the same arity. 
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In the following, we shall often write CIRC{T; P; S) instead of the formula 
CIRC{T(P,S)]P; S). 

Let us now quote the fixpoint theorem formulated and proved in [15] for 
second-order quantifier elimination. 

Theorem 1. Let P be a predicate variable, and 'P'{P),W{-^P) be formulas with- 
out second-order quantification. Let ^(P) be positive w.r.t. P, 'F{^P) be nega- 
tive w.r.t. P and W{P) be positive w.r.t. P, then 



3Pyy[^{P) D pm A ['P(-P)] 


= 


- pP{y).HP)], 


(1) 


3PVy[P(y) D HP)] A [<P'(P)j 


= tp'[P^ 


- iyP{y)-HP)], 


(2) 



where the above substitutions exchange the variables bound by fixpoint operators 
by the corresponding actual variables of the substituted predicate. (1), ((2)) is 
used to minimize (maximize) P. □ 

The definition of semi-Horn formulas, for which Theorem 1 is applicable, has 
been introduced in [5]. In what follows we shall consider a restricted version 
of semi-Horn formulas, where the recursive part of the semi-Horn formula is 
restricted as to the use of universal quantifiers. 

Definition 2. By a semi- Horn formula (w.r.t. Q) we understand a conjunction 
of formulas of the form 



[<?(x) D Q(S)j Af(-Q), (3) 

and 

[Qix)D${x)]AW{Q), (4) 

where (l>{x) is any classical first-order formula positive w.r.t. Q and 'P{^Q) 
{'P{Q)) is any first-order formula negative (positive) w.r.t. Q. Formula <P{x) D 
Q{x) {Q{x) D ^{x)) is called the recursive part of (3) ((4)) and 
is called the negative (positive) part of (3) ((4)). 

By a semi-Hom formula we understand a semi-Horn formula w.r.t. all pred- 
icate symbols occurring in the formula. □ 

3 Representing an Agent’s Knowledge 

Suppose IT is a complete logical theory formalizing what is true in an actual 
world state w. Suppose also, that T is a finite first-order theory, i.e. a finite set 
of sentences from Pi, formalizing an agent’s knowledge about w. Following [9], 
we say that an agent has local closed-world information w.r.t. a formula a and T 
iff 

T \= aO or T \= ~^a9 for each ground substitution 0. 

It is assumed that any knowledge the agent infers from T is correct in the actual 
world w. Since T provides only incomplete information about w, not all facts 
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about w are known to the agent. In other words, only some of the information 
is locally closed relative to T, other information is unknown. 

Following [9], we approximate an agent’s knowledge about T by a pair M, £, 
where M is a finite set of positive or negative ground literals and £ is a set of first- 
order formulas, representing local closed- world assumptions. We assume that if T 
formalizes the agent’s knowledge about the world, then for each formula a 

M \= a implies T \= a. 

Let Cl, . . . ,c„ be all the constants from the alphabet under consideration. 
We write DCA{M) to denote the domain closure axiom for a theory M . This 
is the formula 

n 

Vx. \J X = Ci- 

i=l 

We write U N A{M) to denote the unique name assumption axiom for a theory M . 
This is the formula 

f\ Ci Cj. 
l<2<j<n 

We write M, £ |= a to denote that a formula a follows from a pair M, £. 
This notion is defined as follows. 

Definition 3. Let M, £ be a finite set of ground literals and a set of formulas 
representing closed-world information, respectively. Suppose that £ consists of 
formulas /3i, . . . ,/3„. Let R = Ri,. . . ,Rn be a set of new predicates symbols 
similar to /3i , . . . , /3n-'^ By an LCW-based extension of M, denoted by M’, we 
shall understand this to be the theory consisting of formulas of M, augmented 
by: 

- DCA{M) and UNA{AI) 

— the set of formulas Vx.Ri{x) = Pi {i = 1, .. .n). □ 

The following definition provides us with the semantics of LCW as under- 
stood in this paper. 

Definition 4. Let S be the set of all predicate symbols occurring in /3i, . . . , /3„. 
Then 

M,£ h a iff CIRC{M';R]S) \= a, 

where R = (i?i, . . . , i?„). □ 

Note that definition 4 provides the general case and semantics for reasoning 
under the LCWA. The rest of the paper considers restrictions on M, £ and 
QLCW which make reasoning under the LCWA tractable. 



^ A predicate symbol P is similar to a formula a iff the arity of P is equal to the 
number of free variables of a. 
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In what follows we divide M' into three parts: 

— a positive part, denoted by M+, consisting of positive literals of M; the 
positive part is intended to gather positive information directly included in 
the database M 

— a negative part, denoted by M_, consisting of negative literals of M; the 
negative part is intended to gather negative information directly included in 
the database M 

— an LCW part, denoted by Me, consisting of equivalences Vx.Ri{x) = Pi 
{i = 1, . . .n) introduced in Definition 3. 

Observe that M+ is just an extensional database as understood in the field of 
deductive databases (see, e.g. [1,8]). Also M_ can be easily treated as a part of 
an extensional database. Now (deductive) queries are represented by Me embed- 
ded in a tractable query language like fixpoint calculus or classical first-order 
logic (see e.g. [1]). Thus, whenever LCW is polynomially reducible to fixpoint or 
classical formulas, one has a tractable reasoning mechanism. 

In what follows we often call M+ U M_ simply a database. 

4 The Main Result 

The following theorem provides us with a sufficient condition which guarantees 
that second-order quantifiers can be eliminated from CIRC{M' ■, R; S) using the 
fixpoint theorem (Theorem 1) and some syntactic transformations applied in the 
DLS algorithm [6]. 

The main result of this paper, formulated below, shows that the second-order 
formula resulting from circumscription can be reduced to a fixpoint formula. 
Thus the complexity of reasoning is polynomial in the size of M. This follows 
from the fact that the database part of M is not affected by the quantifier elim- 
ination process. Only LCW constraints can, in some cases, introduce additional 
complexity. However the size of the resulting formula is, in the worst case, not 
greater than m -I- O(n^), where n is the size of LCW constraints together with 
the query and m is the size of the database. 

In the proof of the theorem we use second-order quantifier elimination (For 
surveys of approaches to second-order quantifier elimination consult [6,14]). Be- 
cause of the space limitations, the proof is not included in the current paper, 
but is available from the authors. 

Theorem 2. Let CIRC{M'; R; S) be defined as in Section 3. If M consists of 
literals and LCW constraints in C are defined by means of semi-Horn formulas, 
then the following conditions hold: 

~ second-order quantifiers can be eliminated from CIRC{M' ■, R; S); 

— if the size of the database M is m and the size of Me together with the 
query is n then, in the worst case, the resulting formula has size m + 0{n^). 
Moreover M is not affected by the quantifier elimination process. 
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The second-order quantifier elimination technique applied in the proof of 
Theorem 2 is based on Theorem 1 and provides us also with definitions of the 
eliminated predicates. As in [8], this feature is crucial for the approach we present 
in this paper. More precisely (for details see e.g. [8]): 

— in the case of formulas of the form (1), one gets an explicit definition of the 
least relation P satisfying the first-order part of (1); and 

~ in the case of formulas of the form (2), one gets an explicit definition of the 
greatest relation P satisfying the first-order part of (2). 

Observe that Theorem 2 can still be generalized using techniques of [6,7,8,14]. 

Corollary 1. Consider a relational (or deductive) database in which the query 
language QLCW is the classical first-order logic or monotone fixpoint calculus^. 
If M consists of literals and the LCW constraints in C are defined by means of 
semi-Horn formulas, then: 

— the time complexity of the quantifier elimination algorithm is polynomial in 
the size of the input query; 

— the formula resulting from the quantifier elimination process is a mono- 
tone fixpoint formula, thus time and space data complexity of querying the 
database is polynomial in the size of the database; 

— if all ^’s occurring in recursive parts of semi Horn formulas defined in def- 
inition 2 (i.e. in formulas of the form <P{x) D Q{x) and Q{x) D do 

not contain Q’s then the formula resulting from the quantifier elimination 
process is a classical first-order formula. Thus assuming that the query lan- 
guage is restricted to the classical first-order logic one obtains polynomial 
time data complexity and polylogarithmic space data complexity [1,12]. 

Proof. The first item easily follows from the results provided in [-5,8] and from 
the proof of Theorem 2. 

The second item just quotes results well-known from deductive databases 
(see e.g. [1,12]). 

The last item follows from the fact that for such formulas the Ackermann 
Lemma [2] is applicable - see also [6]. 

5 The LCW Algorithm 

Theorem 2 together with results in [8] provides us with a complete and tractable 
algorithm for deduction from a database M and LCW database £, assuming 
that formulas in C and a are formulated as semi-Horn formulas and M consists 
of literals. An un-optimized abstract version of the algorithm is shown below: 



^ I.e. calculus in which fixpoint are defined on monotone formulas - see e.g. [1]. 
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Function LCWQuery(o;, M, £): 3-boolean 
Me := M U Reduce{C) 
if RQuery{Mc, a) 0 then return T 
else if RQuery{Mc,^a) ^ 0 then return F 
else return U; 
end. 

RedueeQ is the quantifier elimination technique described in [8] extended 
with the result from Theorem 2. It is assumed that RedueeQ provides us with 
definitions of the predicates eliminated from CIRC{M'; R, S). 

RQueryQ is based on [12] and returns a set of tuples satisfying a. However, 
eWA is not assumed by RQueryQ. 

6 Related Work 

In this section, we show that our approach subsumes the approaches proposed 
in [9,3] and [10]. 

In [9] it is assumed that the LCW database, £, consists of formulas that are 
conjunctions of atoms. We write M, £ \=£ a to denote that a formula a follows 
from a pair M, £ in Etzioni et al. [9] approach. The following theorem holds. 

Theorem 3. For all M and £ 

M,C \=£ a implies M.,Z\= a. □ 

Similarly, the ■!/;- forms considered in [3] are simply expressible in the language 
we deal with. Moreover, the semantics of both approaches is equivalent when 
restricted to the ■i/j-forms only. 

In fact, the approach presented in [3] is subsumed by the one provided in [10]. 
In [10] Horn clauses, with additional built-in predicates, are used to express LCW 
constraints. These are easily expressible in our approach as we deal with semi- 
Horn formulas that are substantially more expressive than Horn clauses. 

Note that the subsumption results are related to reasoning in a static state 
under the LCWA and not to sequences of dynamic states where updating the 
LCWA database is an additional issue considered in both [9] and [3]. 

7 Example 

Example 1. The following example demonstrates the versatility of the approach 
by representing the UAV example in section 1 . There are four cars with different 
signatures based on color. The UAV’s focus of attention (FOA) is region r3. 
In £, we assume complete information about the ContainedInQ relation by 
minimizing it (6), and the InQ relation by maximizing it (5). (7) encodes the 
following LCWA by maximizing the relation 5'ee(): 

After sensing region r3 with a camera, we want to assume that we have seen 
all moving vehicles in the FOA (r3) except for those with signature gray. 
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In querying the database using the LCWQuery algorithm, we can infer that 
5'ee(cl, r3) holds, but it is unknown whether 5'ee(c2, r3), due to its signature; 
unknown whether See{c3, r3), because it is unknown whether it is moving; and 
unknown whether 5'ee(c4, r3) because it is not in the FOA. Note that the latter 
queries return unknown and not false due to the incompleteness of the database. 
In fact, other sensors may contribute to whether the unknown vehicles are seen. 



£ = {In{x,r') D -^(In(x,r) A ContainedIn(r,r')), (5) 

ContainedIn{r,r'), ( 6 ) 

See{x, r3) D ~^{InFOA{r3) A In{x, r3) A Sig{x, s) A 

s ^ gray A Moving{x))} (7) 

M = {/n(ci, ri), In(c 2 , r 2 ), /n(c 3 , ri), Jn(c4, r 4 ), 

Moving{ci), Moving{c 2 ), M oving{cn) , 
sig(ci, b\ue),sig{c 2 , gray), stg(c 3 , green), 515 ( 04 , yellow), 
ContainedIn{rl, r3), ContainedIn{r2, r3), InFOA{r3)} 

In order to understand why the constraints for /n() and S'ee() in £ are 
represented in the manner above, it is important to observe that the relations 
we want to minimize or maximize are in fact relations that are varied in the 
circumscriptive definition used for LCWA. Consequently, the minimization and 
maximization are achieved indirectly. 

Another interesting observation is that the query generated by the quantifier 
elimination procedure results in a fixpoint formula due to the recursive definition 
of In{). 

8 Conclusions 

We have extended and subsumed the LCW querying techniques described 
in [9,3,4], and [10] and presented a tractable algorithm. The technique is based 
on the use of circumscription and results from the deductive database commu- 
nity and is consequently amenable to generalization. We have demonstrated the 
versatility of the approach by encoding a relatively complex UAV sensing sce- 
nario. We have not yet dealt with the LCW update problem associated with the 
query mechanism’s integration with other planning and state sequential reason- 
ing techniques considered in the other approaches, but are currently pursuing 
the problem. 
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Abstract. The main operators in Inductive Logic Programming (ILP) 
are specialization and generalization. In ILP, the three most important 
generality orders are subsumption, implication and implication relative 
to background knowledge. The present paper discusses the existence of 
least generalization under implication relative to background knowledge. 
It has been shown that the least generalization under relative implication 
does not exists in the general case, but, as argued in this paper, it exists 
if the sets to be generalized and the background knowledge satisfy some 
special conditions. 



1 Introduction 

Inductive Logic Programming (ILP) is a subfield of Logic Programming and 
Machine Learning that investigates the problem of inducing clausal theories from 
given sets of positive and negative examples. An inductively inferred theory must 
imply all of the positive examples and none of the negative examples. The paper 
is organized as follows. In section 2 some preliminary definitions of the concepts 
used in the further discussion will be given. In Section 3 we will discuss existence 
of least generalization under relative implication and it will be shown that it 
does not exist in the general case, but exists if the sets to be generalized and 
the background knowledge (BK) are of some special kind. The most interesting 
and useful case is to find least generalization under relative implication(LGRI) 
(which is a set of definite program clauses) for the BK and sets of positive and 
negative examples that are definite program clauses. In section 3 it will be shown 
that in this case, after imposing some additional restrictions to the given sets, 
LGRI exists. 

The LGRI exists for many other more particular cases of the given sets (for 
details see [5,6]). However in most of them the background knowledge is a set 
of ground clauses or literals. Even the subsumption is weaker than implication, 
LGRS not exists in the general case both for the clausal language and for a Horn 
language. LGRS exists only for background knowledge sets of ground atoms. 

2 Preliminaries 

The definitions of the concepts used in the further discussion are given in this 
section. 
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Definition 1: Let 17 be a set of formulas and 4> a formula. Then <f> is said to be 
a logical consequence of S (written as 17 |= </)), if every model of 17 is a model of 
If 17 ^ cj), we also sometimes say that 17 logically implies (or just implies) <j). 
If 17 = IV'}) this can be written as ip \= tjj- 

Definition 2: Let 17 and F be sets of formulas. F is said to be a logical con- 
cequence of 17(written as 17 |= T), if 17 |= </>, for every formula (p € F. We also 
sometimes say that 17 (logically) implies F. 



Definition 3: Let T be a set and i? be a binary relation on F. 

1. R is refflexive on F if xRx for every x G F. 

2. R is transitive on F if for every x,y,z G F, xRy and yRz impies xRz. 

3. R is symmetric on F if for every x,y G F^ xRy impies xRy. 

4. R is anti- symmetric on F if for every x,y G F, xRy and xRy^ implies x = y. 

If R is both reflexive and transitive on F we say i? is a quasi-order on F. If R is 
both transitive and anty-symmetric on F we say i? is a partial order on F. If R 
is reflexive, transitive and symmetric on F we say i? is a equivalence relation. 



Definition 4: Let T be a set of clauses, > be a quasi-order on F, S C F he a 
finite set of clauses and C G F. If C > D for every D G S, then we say that C is 
a generalization of S under >. Such a C is called a least generalization (LG) of S 
under > in F ii we have C > C for every generalization C G F of S under >. 

Dually, (7 is a specialization of S under >, if D > C for every D G S. Such 
a C is called a greatest specialization ( GS) of S under > in F ii we have C > C' 
for every specialization C G F oi S under >. 



Theorem 1 (Deduction Theorem): Let 17 be a set of formulas and (p and ip 
be formulas. Then 17 U {ip} \= (p iff S \= (ip (p) . 



Preposition 1: Let 17 be a set of formulas and phi be a formula. Then 17 |= ^ 
iff 17 U {^(p} is unsatisfiable. 



Definition 5: Let B be background knowledge (set of clauses) and C and D be 
clauses. We will say that C logically implies D relative to B if {C} U B |= D and 
we denote as C |=b D. 

Definition 6 (Concept learning problem): Given background knowledge B 
and given sets of positive and negative examples P and N, the induction task 
of a concept learning problem is to find a concept description in the form of a 
logic program T that satisfies the following conditions: 
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1 . T U B ^ A for all A G P (posterior sufficiency) 

2. T U ^ A for all A € (posterior satisfiability) 

3. B^ 1= A for all A € iV (prior satisfiability) 

4. B^ 1= A for all A € P (prior necessity) 

Every such program T is called a target program. 



Definition 7: Let H and B be sets of clauses and P be a clause. iJ is a least gen- 
eralization of D under relative implication (LGRI) to background knowledge B, 
if H \=B D and for each set of clauses C, such that C \=b D is valid C \=b H. 

Definition 8: Let C and P be a clauses and P be a set of clauses. We say that 
C subsumes P, denote C > P if there exists a substitution 6 such that C9 C P. 



Definition 9: Let P be a first-order language. The Herbrand universe Ul for L 
is the set of all ground terms, which can be formed out of the constants and 
function symbols appearing in L. In case L does not contain any constants, we 
add one arbitrary constant to the alphabet to be able to form ground terms. 

Definition 10: Let L be a first-order language. The Herbrand base Bl for P is 
the set of all ground atoms, which can be formed out of the predicate symbols 
in P and the terms in the Herbrand universe Ub- 



Definition 11: Let P be a first-order language. The Herbrand pre-interpretation 
for P is the pre-interpretation J consisting of the following: 

1. The domain of the pre-interpretation is the Herbrand universe Ul- 

2. Constants in P are assigned to themselves in Ul- J{a) = a, a-constant 

3. Each n-arity function symbol / in P is assigned the mapping Jf from P£ 
to Pi, defined by Jf(tl , . . . , tn) = f{tl , . . . , tn)- 

Definition 12: Let P be a first-order language and J a Herbrand pre-inter- 
pretation. Any interpretation /, such that J C J is called a Herbrand interpre- 
tation- 

Definition 13: Let L be a first-order language, P a set of formulas of P, and I 
a Herbrand interpretation of P. If / is a model of P, it is called a Herbrand 
model of P. 

Definition 14: Clause C subsumes (or is more general than) clause D with 
respect to logic program P if for any Herbrand interpretation I (for the language 
of at least P, C, D) such that P is true in /, and for any atom A, C covers A 
in I whenever D covers A. This is denoted C >p D. C is referred to as a 
generalization of P>, and D as a, specialization of C- 
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All other concepts used above have the standard definitions. For more details 
see [1,2, 3, 5, 6]. 

3 Existence of Least Generalization under Relative 
Implication 

In this section we will discuss the existence of least generalization under relative 
implication. 

For general clauses, the LGRI-question has a negative answer. We will sketch 
the counter example given in [5,6]. 

3.1 Example for Non-existence of LGRI in the General Gase 

Example 1: Even if S and the background knowledge S are both finite sets 
of function- free clauses, a LGRI of S relative to S does not necessarily exist. 
Let Di = P{a), D 2 = P{b), S = {Di,D 2 } and A = {(P(a) V -Q(cr)), (P(6) V 
^Q(a;))}. We will show that S has no LGRI relative to E. 

Suppose G is a LGRI of S relative to E. Note that if C contains the literal 
P{a), then the Herbrand interpretation that makes P(a) true and which makes 
all other ground literals false would be a model of A U {C} but not of D 2 , so we 
have C P> 2 - Similarly if C contains P(b) then C Di. Hence C cannot 
contain P(a) or P(6). 

Now let d be a constant not appearing in C. Let D = P{x) V Q{d). Then 
D \=s S. By the definition of the LGRI, we should have D \=s C. Then by 
Subsumption Theorem [5], there must be a derivation from E\J{D} of a clause P, 
which subsumes C . The set of all clauses which can be derived (in 0 or more 
resolution-steps) from E\J {D} is AU {P}U {(P(a) V P{x)), (P(5) V P{x))} but 
none of these clauses subsumes C, because C does not contain the constant d or 
the literals P(a) and P{b). Hence D C contradicts the assumption that C 
is a LGRI of S relative to E. 

Thus, in general LGRI of S relative to E need not exist. 

3.2 Analyses of Some Properties of the Given Sets 

Where is the weak point? Let’s look again on the background knowledge set 
E = {(P(a) V ~^Q{x)), (P{b) V ^Q(x))}. We can present this set in the following 
equivalent form E = {(Q(x) ^ P(o)), {Q{x) P{b))}. The BK set E consists 

of Horn clauses and we can represent it as the program: 

p(a) : -q(X) . 
p(b) : -q(X) . 

We can see that two different ground instances (P(a) and P{b)) of the pred- 
icate P(x) can be inferred from an arbitrary grounding of Q{x) inferences. One 
of the possible generalizations of the given set relative to the BK is: 



p(Y) :-q(X) . 
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But there is no dependency between the variables X and F, and this is not 
a useful generalization, because it is not generative. Thus, some restrictions on 
the BK and the set to be generalized must be made to ensure the existence 
of a LGRI. Some examples of cases when a LGRI does exist will be given and 
after analysing them we will formulate the requirements for the BK and the 
initial set. Let the BK E = {Ci, C 2 , . . . , Cm} be a finite set of clauses and 
S = {Di,D 2 , . . . , Dn} be a finite set of clauses. Additionally we suppose that: 

— a substitution 9, such that CibodyO = Cjbody, for i ^ j does not exist 

— a predicate A such that A' G Cihead and A” G Cjhead, where A' and A" are 
ground instances of A, does not exist. 



Example 2: Gonsider the following set of positive examples: 

Cl = food(X) : -tasty (X) , strawberry (X) . 

C2 = food(X) : -tasty (X) , not_poisonous (X) , mushroom (X) . 

The most obvious way to generalize them is to take their least generalization 
under implication, which is the rather general and not very useful clause: 

D = food(X) : -tasty (X) . 

Suppose we have the following definite program E = {Ei, B 2 , B 3 }, expressing 
background knowledge: 

B1 = plant (X) : -mushroom (X) . 

B2 = plant (X) : -strawberry (X) . 

B3 = not_poisonous (X) : -strawberry (X) . 

Taking E into account, we may also find the more informative generalization 
clause: 

D’ = food(X):- tasty(X) , not_poisonous (X) , plant(X). 

D' together with E implies both examples, but without the BK our clause D' 
does not imply the examples. For instance, not everything that has delicious taste 
is eatable, some things can be poisonous or harmful for people. 




Fig. 1. The main view of the V-operator and the W-operator 
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In their article [7] , Muggleton and Buntine described two operators based on 
inverting resolution steps: the V- and the W-operator (fig.l). 

Given Ci and R, the V-operator finds C2 such that R is an instance of a 
resolvent of Ci and C2. Thus the V-operator generalizes {Ci, i?} to {Ci, C2}. The 
W-operator combines two V-operators, and generalizes {i?i, i?2} to {Ci, C2, C3}, 
such that i?i is an instance of a resolvent of Ci and C2, and R2 is an instance 
of a resolvent of C3 and C2- In addition the W-operator is able to invent new 
predicates. 

Going back to the example described above it is easy to see, that D' is a result 
of consecutively applying V- (see Fig. 3) and W-operators(see Fig. 2) under C\, 
C2 and clauses of E. 




Fig. 2. The W-operator applyed on C2 , i?i , i?2 and F 



Let D is the result of the W-operator applyed on C2, Bi, B2 and F. D is the 
LGRI of {Cl, C2} under (Bi, B2, B^}. 

Let Fi is the result of the V-operator applied on Ci and B3 . 



3.3 More Definitions 

These two operators require some restrictions on the type of the given clauses. 
The following definitions will help us to describe some of them. 



Definition 15: Let C be a clause. C is a generative clause if all variables 
in Chead are contained in Cbody 
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F= food(X):-tasty{X), 


B3= noLpoisonous(X) strawberry(X) 


noLpoisonous(X), strawberry(X). 



B3= not_poisonous(X) strawberry{X). 



Fig. 3. The V-operator applied on C\ and B 3 



Definition 16: Let C be a clause and 17 be a set of clauses. Let C contain n 
different variables. C is a determined clause with respect to S if after binding 
n — 1 variables of C with terms of S for the remining variable of C there exists 
a unique substitution that binds this variable with a term contained in S. 

Definition 17: Let S = {Ci, C2, ■ ■ ■ , Cm} and S = {Di,D2, . . . , D„} be finite 
sets of clauses. S has an absolute model under S if for each Di G S and for each 
literal L S Dibody there exists a clause E = {some Cj G S or some Dj G S}, 
and a substitution a such that La € E. 



For a clause C2 to exist, the V-operator requires C\ and R to be generative 
clauses. 

For clauses Ci, C2 and C3 to exist, the W-operator requires i?i, R2 to be gen- 
erative clauses. The found clause C3 is generative and determined with respect 
to the set {i?i, i?2, C\,C2}- 

The clauses R\, R2 have one and the same head, hence the clause C3 will 
have the same head and the clause C2 will be a generalization of the set of clauses 
{i?i, i?2j <1^17 C'a}- 

Suppose that i?i, R2 are members of the given set of clauses, that we would 
like to generalize and Ci and C3 are clauses from the background knowledge. 
We can consider C2 as a generalization under implication of R\, R2 relative to 
background knowledge set {Ci.Cz}. 

The clause C2 is a LGRI of i?i, R2 because it is generated by one resolution 
step. 

If the set that will be generalized has an absolute model under background 
knowledge then we can easily combine clauses from the given set and the BK in 
V- and W-operators. 

The previous discussion enables the formulation of the following theorem. 
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3.4 Theorem of Existence of LGRI in Limited Case 

Theorem 2: Let E = {Ci, (72, . . . , Cm} be a finite set of funetion-free definite 
program clauses and S = {Di, D 2 , ■ ■ ■ , Dn} he a set of function-free definite 
program clauses and all Di have the same predicate symbol in their heads and at 
least one of them is non-tautologous. If S has an absolute model under E and 
all the clauses of S are generative and E S , then a LGRI of S relative to E 
exists. 



Proof: Let E = {Ci,C 2 , ■ ■ ■ , Cm} be a finite set of function-free definite pro- 
gram clauses and S = {Di,D 2 , ■ ■ ■ , D„} be a set of function-free definite program 
clauses and all Di have the same predicate symbol in their heads and at least 
one of them is non-tautologous. The LGRI T of S relative to E exists if for ev- 
ery Ci, Cj G E there does not exist a substitution 0 such that Cibodyd = Cjbody, 
and there does not exist a predicate A such that A' e Citead and A" e Cjhead, 
where A! and A!' are ground instances of A and for every Di G S and for every 
literal L G Di there exists clause E = {some Cj G E or some Dj G S'} and a 
substitution a such that La G E. 

Then T Di iff |T} \J E \= Di T \= Di\J ^E. It remains to be 
shown that DiU ^E is a set of function- free clauses and at least one of them is 
non-tautologous. Then by the theorem for existence of the least generalization 
under implication (LGI) [5,6], it will follow that a generalization H exists. The 
clauses of the set DiU ^E are function- free, as required in the condition of the 
theorem. Since each Dj G S has the same predicate in its head, each clause in 
T = {{Di U ^E), (Z ?2 U ^E), {Dn U ^L')} will contain the same predicate in its 
head. 

Because of the conditions of the theorem, each of the elements of T is a 
definite program clause. 

It remains to show that T = {{Di U ^E), {D 2 U ^77), (D„ U ^77)} contains at 
least one non-tautologous clause. Suppose that all clauses in T are tautologous. 
From the definition of tautologous clause we conclude that every interpretation 
is a model of the clauses in T, in other words \= Di U ~^E, hence E \= Di 
for each Di G S, hence 17 |= 5, but this is a contradiction with the theorem 
conditions. 

So, T is a set of definite program clauses and at least one of them is non- 
tautologous. From the theorem for existence of LGI (see [5,6]), we obtain that 
there exists a LGI H of T and H will be a LGRI of S relative to E. 

Why do we need the sets’ restrictions in the theorem 2? Are they too strong 
or not? 

Most of the restrictions are necessary, because of the V- and W-operators 
requirements for the existence of the generalization clause and its computability. 

The restriction 17 5" is imposed by the definition of the concept learning 

problem (prior necessity). 

The restriction of the set S to contain one and the same predicate symbol in 
their heads is imposed by the necessity for the obtained LGRI of S under BK to 
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be a program, that gives the definition of the concept, coded by this predicate 
symbol. 

The other restrictions come from the analysis of the contradiction example 
mentioned above and from the requirement for the background knowledge to be 
consistent. 

3.5 Computability of LGRI 

This LGRI is computable because, it is a kind of LGI, which is computable. 
There are exists algorithm for construction of LGI of given sets. This algorithm 
is not very efficient. A more efficient algorithn may exist but since implication 
is harder than subsumption and the computation of an LGS is already quite 
expensive we should not put our hopes too hight. Nevertheless the existence 
of the LGI-algorithm does establishe the theoretical point that the LGI of any 
finite set of clauses containing at least one non-tautologous function-free clause 
is effectively computable. 

4 Conclusion 

The presented case of existence of a least generalization under relative impli- 
cation helps us to search for generalizations of the concepts presented by most 
natural and often used types of sets and background knowledge. In the con- 
cept learning problem, usually examples are presented as ground literals and/or 
definite program clauses, and the background knowledge is a program. It is rea- 
sonable to expect that the LGRI of these sets will be a program too. 

The contribution of this paper is the discovery of a more general (than those 
described in the literature) case of existence of least generalization under rela- 
tive implication. This result can be used for several applications in the field of 
Machine learning, such as automated generation of concept definitions, improve- 
ment of predicate definitions and other kinds of concept generalization. 

In the further work a simpler algorithm for finding the least generalization 
under relative implication will be presented concerning the described cases. 
Another line of research is to find other cases of existence of LGRI. 
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Abstract. This paper presents a theorem prover for a combination of 
constructive first-order logic and the A-calculus. The paper presents the 
basic theorem prover, which is an extension of [6]’s model generation 
theorem prover for first-order logic, and considers issues relating to the 
compile-time optimisations that are often used with first-order theorem 
provers. 



1 A Constructive Intensional Logic 

For various reasons, the idea of a language which allows you to construct ab- 
stractions and apply them to terms, as in the A-calculus, and to combine these 
operations with the truth functional connectives of predicate logic, is extremely 
tempting. It is well known, however, that simply adding the A-calculus and pred- 
icate logic together opens the way to the paradoxes of negative self-reference - 
the Liar, Russell’s set, and so on. 

The classical way out of this is to place restrictions on what can be 
said [11,8,3]. [9] approaches the matter by allowing you to say whatever you 
want, but then placing constraints on what can be proved. The current paper 
follows this general approach, but uses entirely different constraints. 

Turner takes a classical treatment of first-order logic and adds A-abstraction 
and /3-reduction to it (or at any rate, operations which look extremely like A- 
abstraction and /3-reduction). In order to avoid the paradoxes, however, he con- 
strains the circumstances under which you are allowed to perform A-abstraction. 
The constraints he chooses are enough to make the underlying logic consistent 
(in other words, to avoid the paradoxes), and makes the paradoxes unstable [2]. 
The current paper takes a constructive treatment of first-order logic, allows un- 
restricted use of both A-abstraction and /3-reduction, but avoids the paradoxes 
by placing constraints on the assumptions that can be used in a well-founded 
proof. 

The logic which we will use, which we will call A{C) for constructive A- 
calculus, extends first-order logic as follows: 

yl(C')-l If A is a formula of first-order logic then it is a formula of A{C), and if t 
is a term of first-order logic then it is a term of A{C). 
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A(C)-2 If A is a formula of 71(C), possibly including free occurrences of x, then 
XxA is a term of A(C). 

yl(C)-3 If t and t' are terms of A{C) then t.t' is a formula. 

The proof theory for A{C) is obtained by adding the rules in Fig. 1 to a 
standard set of natural deduction rules, which I will refer to as ND (these rules 
are omitted here for space reasons, but any standard text on contractive logic 
will provide such a set, e.g. [10]). 



(A-intro:) ah [. . . , A, . . .] a h [. . . , {\xA').t , . . .] 

([. . . , A, . . .] is any formula containing A as a subformula or a term, 
and [. . . , A , . . .], the formula that is obtained from [. . . , A, . . .] by 
replacing 0 or more instances of t in A by x.) 

(A-elim:) a h [. . . , (AxA).t, . . .] a h [. . . , At/^, . . .] 



Fig. 1. Natural deduction rules for A(C) 

(A-intro) and (A-elim) add A-abstraction and /3-reduction to ND. Theorem 1 
shows that we can do this without introducing proofs of T. 

Theorem 1. Soundness of A{C) 

If there is no proof of T from a using ND then any proof of T from a using all 
the rules o/A(C) introduces some irreducible instance o/(AxA).C. 

Proof. 

Suppose that ao F Aq, cti F Ai, . . . , F T is a proof of T from ag using the 
rules A(C), where ag contains no irreducible A-applications and there is no proof 
of T from ag just using ND; and that there is no proof of T from any set (3 which 
also satisfies the conditions but which contains fewer applications of A-elim and 
A-intro. 

Consider the first use in this proof of T from ag of either (i) A-elim to change 
some formula (AxA).t into or (ii) A-intro to change some formula A^/^ into 
(AxA).t (there must be one, since otherwise ag would have supported a proof 
of T from ND alone). In case (i) we can obtain a proof of T from a U At/^,, 
and in case (ii) we can obtain one from a U (AxA).t, each of which omits the 
relevant step, and hence involves fewer applications of these rules, contradicting 
the assumption (note that {XxA).t is irreducible iff. A^/j, is, so that the first step 
which adds either of these to a will not introduce an irreducible formula unless 
there was already one there) □ 

The point of this theorem is that any proof of T from a set which is consistent 
under the first-order rules must introduce some irreducible formula (since oth- 
erwise every subproof would satisfy the conditions of the theorem). If we take 
the constructive view of A-applications as promissory notes for proofs, or for 



^ {XxAf.t is irreducible if there is no sequence of applications of A-elim which will 
produce a term with no occurrences at all of {XyB).s. 
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programs that would produce proofs [10], then irreducible formulae are promises 
that can never be fulfilled. 

Caveat emptor: Theorem 1 shows that the logic is sound so long as you avoid 
formulae with irreducible applications of A-terms. It does not prevent you from 
saying things which would have been better left unsaid. It is perfectly easy to 
ask whether the property Xx{^x.x) holds of itself, and to conclude that it both 
does and doesn’t. But (Ax(-'X.a:)).(Ax(^a:.x)) is irreducible, and hence there is 
no reason to suppose that it can investigated safely. 

2 Satchmo for A(C) 

The theorem prover we will present for A{C) is developed by extending [6]’s 
first-order theorem prover Satchmo. The original presentation of Satchmo is 
very effective for puzzles (where all the information that is present is required 
for solving the problem, but it is hard to see how to use it), but can perform very 
poorly on problems where a lot of the information that is present is irrelevant. 
[7] and [5] show how to avoid some of the pathological behaviour of the basic 
Satchmo algorithm in such circumstances: unfortunately these techniques, like 
most other optimisations for first-order theorem provers [4] [1], rely on a static 
analysis of the problem. Optimisations that rely on static analysis of the initial 
problem statement do not work for intensional logics. Section 3 shows how to 
recover such optimisations dynamically. 

The original presentation of Satchmo is unsuitable for our purposes, since 
it assumes a classical version of predicate logic, so that you can prove P by 
showing that is unsatisfiable, and you can also use equivalences such as 
((P — > Q) — > P) ^ {{Q — > R)Sz{P or R)) which are not available in constructive 
logic. We therefore need to adapt it so that it does work properly for ND. 

We do this in two stages: first we have to convert our problem into an ap- 
propriate normal form, and then we have to adapt the basic Satchmo engine to 
work constructively with this normal form. 

Normal form: 

The construction of a normal form proceeds in three stages. 

(i) We start by making very straightforward textual changes, to make standard 
logical form look a bit more like Prolog and to get rid of existential quantifiers. 

NF-1 Replace (A & B) by (A’ , B’) and (A or B) by (A’ ; B’), where k’ and 
B’ are the normal forms of A and B. 

NF-2 Replace not (A) by (A’ => absurd) 

NF-3 Replace P => (Q => R) by ((P & Q) => R)’. 

NF-4 Skolemise away existential quantifiers, and remove all universal quantifiers. 



(ii) Separate the result of (i) into Horn and non-Horn clauses, and convert the 
Horn clauses to ordinary Prolog. 
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PL-1 If the normal form of P is atomic then assert it as a Prolog fact. 

PL-2 If the normal form of P is (Q, R) then deal with Q and R individually. 
PL-3 If the normal form of P is (Q ; R) then assert split (Q , R) as a Prolog fact. 
PL-4 If the normal form of P is (K => Q) where Q is atomic then assert Q : - K 
as a Prolog rule. 

PL-5 If the normal form of P is (K => (Q, R)) then deal with (K => Q) and 
(K => R) individually. 

PL-6 If the normal form of P is (K => (Q; R) ) then assert split (Q; R) K 
as a Prolog rule. 

(iii) Perform any optimisations that you can on these. We will return to this in 
Section 3. 

Constructive Satchmo 

Once we have the problem converted to normal form, we can use the following 
adaptation of the basic model generation algorithm. 

MG-1 If you can prove A by using Prolog facts and rules then you can prove it. 
MG-2 If you can prove split (P, Q) and you can show that you could prove A 
if you had either P or Q then you can prove A. This step corresponds to 
or-elimination. 

MG-3 In order to prove (A => B), you have to add A to your set of Prolog 
facts and rules and then show that you can prove B. This corresponds to 
— >-introduction. 

Steps (1) and (2) are exactly as in the original presentation of Satchmo, except 
that since Satchmo works by trying to show that the hypotheses + the negation 
of the goal are unsatisfiable it always tries to prove absurd, whereas a construc- 
tive version has to show that the goal itself is provable from the hypotheses. 
Step (3) is introduced because Satchmo relies on the classical equivalence be- 
tween {{P ^ Q) ^ R) and {Q — > R)Sz{P or R) when constructing normal 
forms. This equivalence is no longer available: if we want to prove P ^ Q we 
have to use — ^-introduction. Fig. 2 provides a skeletal implementation of this. 
The only non-cosmetic differences between this and Satchmo are that (i) this 
version implements a constructive version of first-order logic rather than a clas- 
sical one, and (ii) it is slightly more direct when faced with clauses of the form 
{{P ^ Q) ^ R). Most of the work in Satchmo is performed in the backward 
chaining phase where the Prolog facts and rules are being used to prove specific 
goals. By converting ((P ^ Q) — *■ P) to R (P => Q), we ensure that this 
rule is activated when it is required, at the cost of having to prove P => Q by 
asserting P and showing that Q follows from it. If we convert ((P ^ Q) R) 
to R Q and split (R; P), we end up having to explore the consequences of 
asserting P anyway. 

2.1 Adding Abstraction and A-Reduction 

According to A-intro and A-elim, (XxA).t and At/x are equivalent, and according 
to Theorem 1 there is no problem with this so long as none of your assumptions 



Theorem Proving for Constructive A-Calculus 



73 



y, You can prove A either directly 
prove (A) 

A. 

’/. or by proving (P or Q) , (P => A) and (Q => A) 
prove (A) 

split (P; Q) , 

\+ (P; Q) , °/t check you haven’t tried this already 
prove (P => A) , 
prove (Q => A). 

y, To prove (P => A) , assert P and try to prove A 

y. (with some funny bookkeeping to tidy up after yourself) 

(P => A) 

assert (P) , 

(prove (A) -> retract (P); (retract (P), fail)). 



Fig. 2. Basic constructive Satchmo 



or hypotheses contain irreducible instances of {XxP).t. We would therefore like 
to add the following step to the normal forming process: 

NF-5 Replace lambda (X, P) :T by Pj/x- 

This would eliminate all instances of {XxA).t before we ever started trying to 
use the underlying inference engine, so that including such expressions in our 
problem statement would have no effect whatsoever on the performance of the 
theorem prover. Unfortunately, the definition of A{C) also allows formulae of 
the form x.y, where x and y are variables. If it did not, then the language would 
not really be all that different from ordinary first-order logic. But since it does, 
we have a problem with producing the correct normal form for such cases. We 
need one final normal form rule: 

PL-7 If the normal form of P is Q => (X: A), where X is a variable, then replace 
it by split (X : A) :- Q. 

We also have to extend the inference engine to take account of these new 
cases, as shown in Fig. 3. The new clause for prove (A) reflects the decision that 
clauses with A-applications involving uninstantiated functions as their heads 
should be used forwards, like clauses with disjunctive conclusions. The point 
here is that since we do not know what the conclusion of such a clause is, we 
have no way of telling whether it is likely to be useful. We therefore leave them 
out of the backwards chaining part of the proof procedure, and simply allow 
them to emerge when there is nothing more obvious to try. 

The clause for proving (P :T) says that if you know what P is then you should 
actually work with the /3-reduced version. This is guaranteed to work precisely 
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y, you can also prove A if (lambda(X, P):T) => A 
y, and you can prove P(T/X) 
prove (A) 

split (P:T) , 
nonvar(P) , 

f ully_reduce (P : T , R) , 

(R => A) . 

°/t To prove (P:T), try proving P(T/X) instead 
(P:T) 

nonvar(P) , 

f ully_reduce (P : T , R) , 
prove (R) . 



Fig. 3. Extending Satchmo to cover A-intro and A-elim 

because if (P : T) were provable from the original problem statement then we will 
have either replaced it directly by Pj/x during the normal forming process, or we 
will eventually do so when we explore the consequences of split (P:T). 

The program outlined above provides a sound and reasonably efficient theo- 
rem prover for A(C). The soundness is guaranteed by Theorem 1 and Theorem 2. 
The reasonable efficiency is inherited from Satchmo, together with the fact that 
by proving {\xA).t from we start working backwards as soon as we possi- 
bly can. Completeness is any case unavailable, since even first-order logic is only 
semi-decidable, as is the task of deciding whether a A-application is reducible 
(see [10]). 

Theorem 2. The algorithm outlined in Fig. 2 and Fig. 3 is sound. 

Proof. 

Suppose that the algorithm is not in fact sound, i.e. that there is a proof of 
absurd from some set of clauses {Al, . . . , An} using the algorithm in Fig. 2 
and Fig. 3 which would not have led to such a proof just using the algorithm 
in Fig. 2. At some point the proof must have either (i) used a splitting rule to 
derive Pj/x from lambdaCX, P) :T, or (ii) proved lambdaCX, P) :T by proving 
Pt/x- In case (i) we could have proved absurd from {Al, . . . , An, lambdaCX, 
P) :T}, and in case (ii) we could have proved it from {Al, . . . , An, Pj/xj- In 
either case we have a proof of absurd using one less application of the relevant 
rule. We can repeat this until there are no applications of either of the rules 
from Fig. 3, in which case we have a set {Al, . . . , An, Kl, . . . , Km} which 
supports a proof of absurd just using the rules in Fig. 2. □ 

3 Optimisations 

The program described above is ‘reasonably efficient’ - as efficient, that is, as 
something based on Satchmo could be expected to be. As noted earlier, however. 
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Satchmo is very effective for certain kinds of problem, less so for others. There 
are known optimisations for Satchmo, particularly for lessening the impact of 
irrelevant disjunctive clauses. We will consider a number of these in the context 
of the extension of Satchmo to deal with A(C). 

3.1 Deleting and Reinstating Pure Literals 

The first move is to introduce [4]’s notion of ‘pure literals’. Suppose that we have 
a clause of the form p(X, Y) pl(X, Y) , pk(X, Y) , pn(X, Y), 

but there is no clause with pk(U, V) as its head. Then clearly there is no point 
in using this rule when attempting to prove p(a, b), since the k-th subgoal is 
bound to fail. So we may as well remove it from our clause set. The subgoal 
pk(U, V) is said to be pure. 

But this might be the only clause which supports proofs of p(U, V). In 
that case, removing it may well make it possible to delete some other clause, 
which may . . . Kowalski shows how this kind of ‘gangrene’ can lead to quite 
dramatic reductions in the problem statement. The effect tends not to be quite 
so dramatic in the context we are working in (meaning postulates for lexical 
semantics in natural language), but there are in any case two problems with it. 

(i) Kowalski’s original presentation marked a literal L as being pure if there was 
no clause containing a complementary literal L' which would unify with it. We, 
however, are working with equality as well as intensionality. Suppose our initial 
problem consists of the following: 



male{f) 

male{f)&iparent{f,a) father{f,a) 
parent{j, a) 

f = j 



Fig. 4. Rule set with an apparently pure literal 



It seems as though there is nothing which could support parent{f, a), since the 
only potentially relevant literal, parent{j, a), does not unify with it. This suggests 
that we can delete the rule for proving father{f,a). This is clearly too strong, 
since the presence of the equality means that we can prove parent{f, a), so that 
we should not delete this rule. We are therefore restricted to saying that we can 
delete a rule if it contains a subgoal g(tl , t2) for which there is no clause whose 
head has g as its functor and arity 2. This is weaker than Kowalski’s notion, and 
hence is less likely to lead to drastic reductions in the search space. 

(ii) To make matters worse, however, the fact that we have clauses with com- 
pletely underspecified heads means that we actually have no idea at all what lit- 
erals might actually be provable. We therefore cannot simply throw away clauses 
with pure literals, since it is entirely possible that a literal may become impure 
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as a result of the forward application of some clause resulting from a formula 
with an intensional consequent. The best we can do is to put them to one side 
until they become impure, and then reinstate them. Thus instead of deleting a 
clause whose antecedent contains pure literals, we store it in such a way that 
it can easily be reinstated. Fig. 5 shows the form in which father (f, a) 
male(f), parent (f, a) would be stored if parent (f, a) was pure, and we 
include an extra step in the treatment of conditional proofs, as in Fig. 6. 



impure (parent (X , Y) , 

(father (f, a) male(f), parent (f, a))). 



Fig. 5. Storing a clause with a pure literal for later reinstatement 



prove (A => B) 
assert (A) , 
impure (A, CLAUSE), 
assert (CLAUSE) , 

(prove (B) -> 
retract (A) ; 

(retract (A), retract (CLAUSE) , fail)). 



Fig. 6. What to do when a literal becomes impure 

Split rules transform themselves into conditional proofs, since in each case the 
result of a split rule is to introduce a request for a proof that the new information 
would lead to a proof. Fig. 2 dealt with such requests by adding the antecedent 
of the clause to the set of facts and then attempting to show that the goal is 
provable under these new circumstances. Fig. 6 simply makes sure that anything 
that the new facts would help with is made available before the proof continues. 

This introduces many of the benefits of pure literal deletion in a context 
where clauses containing pure literals may suddenly become available as a result 
of moves which could not have been predicted. The cost is a small fixed time 
search for reinstatable clauses every time you undertake a conditional proof. 

3.2 Relevance 

Satchmo can be made to perform very poorly if you include disjunctive clauses 
where one of the disjuncts is irrelevant. The problem is that each use of such a 
clause, say A —f B or C, introduces two conditional proofs, namely B —f G and 
C ^ G, where G is the top-level goal you are trying to prove. If B is not going 
to contribute to a proof of G then B ^ G will be provable precisely if G itself 
is, and likewise for G ^ G. But then at least one of the branches introduced 
by using A ^ B or G is a,t least as hard as just proving G without using this 
clause. 
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[7] and [5] show that you can deal with this problem by banning the use 
of a split clause until it has been shown that its consequents will contribute 
to a proof of G. The current implementation continues to take the approach 
of dynamically reinstating such a clause when at least one of its disjuncts has 
shown up in a failed attempted proof. The key is that instead of including such 
a clause in the database directly, we assert a rule which will itself add the clause 
when its consequents have been shown to be relevant. If we had, for instance, a 
clause like split (p(X) ; q(X)) gl(X), gn(X) we would replace it by 

the clauses shown in Fig. 7. 



p(_Y) 






\+ clause ( (split (p(X) ; q(X)) 


gl(X), 


. . . , gn(X))) , 


assert ( (split (p(X) ; q(X)) 
fail . 


gl(X), ... 


, gn(X))), 


q(_Y) 






\+ clause ( (split (p(X) ; q(X)) 


gl(X), 


. . . , gn(X))) , 


assert ( (split (p(X) ; q(X)) 
fail . 


gl(X), ... 


, gn(X))), 



Fig. 7. Relevance checking 



The first of these says that if you find yourself trying to prove p(t), and you 
don’t already have this split clause available to you, then add it to the database, 
and likewise for q(t). This provides a very cheap way of implementing the 
requirement that at least one of the disjuncts should be potentially relevant to 
something that you actually want to prove. It is less easy to provide a cheap 
test to ensure that both disjuncts are desirable - you have to choose between a 
cheap test that may still allow a some undesirable cases through [7] , and a more 
expensive one which is more rigorous [5]. 

4 Conclusions 

The table in Fig. 8 the effects of the various optimisations discussed in Section 3 
on the performance of the system when applied to a specific task from our NLP 
domain (space precludes a detailed discussion of the particular task: what matter 
here is the effect of the various optimisations). It turns out that pure literal 
deletion and relevance checking interact in unexpected ways^. Strong purification 
corresponds to deleting clauses if they contain literals for which there is no 
relevant Horn clause, weak purification corresponds to deleting them if there is 
no relevant clause at all. If you use strong purification with the relevance check 
from Section 3.2 then clauses will be deleted because their only support comes 

Well I didn’t expect it, and it took a lot of tracking down! 
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from some disjunctive clause; but disjunctive clauses are now only triggered if 
they would be helpful. The problem is that the clauses that the disjunctive 
clauses would have been helped by (which are their triggers) will have been 
purified, and hence will never be accessed. 





-I- groundedness 


-groundedness 


Unoptimised 


1.62 


1.98 


Strong purification 


0.38 


0.42 


Relevance checking 


1.55 


2.04 


Weak purification 


0.54 


0.59 


Relevance -I- weak pur. 


0.52 


0.59 



Fig. 8. Relative effects of optimisations 



The two columns marked ± groundedness show the effect of blocking repeated 
proofs of the same ground fact. 

The key observation is that the optimisations do improve the performance. 
It’s what optimisations are supposed to do, of course, but it’s always reassuring 
when they do. It is striking, however, that the effect of relevance checking with 
this particular problem is extremely marginal. If we simply add relevance check- 
ing to the basic system, we get a small improvement, if we add it to the weak 
form of purification we get a small improvement, but the best performance comes 
from the strong version of purification, which cannot be combined with relevance 
checking. It seems likely the relative effectiveness of different combinations will 
depend on the exact mix of sequents in the problem statement. The move to 
a dynamic version of pure literal deletion was forced on us by the fact that we 
are working in an intensional context, where it is not possible to permanently 
delete clauses, since they may be impurified at any time. It turns out to work 
very nicely with Satchmo, since it means that we can be much more ruthless 
about what we delete. 
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Abstract. Decision trees are considered as an efficient technique to ex- 
press classification knowledge and to use it. However, their most standard 
algorithms do not deal with uncertainty, especially the cognitive one. 

In this paper, we develop a method to adapt the decision tree technique 
to the case where the object’s classes are not exactly known, and where 
the uncertainty about the class’ value is represented by a belief function. 
The adaptation concerns both the construction of the tree and its use to 
classify new objects characterized by uncertain attribute values. 



1 Introduction 

Decision trees are among the well known machine learning techniques. They 
are widely used in a variety of fields notably in artificial intelligence applica- 
tions. Their success is explained by their ability to handle complex problems 
by providing an understandable representation easier to interpret and also their 
adaptability to the inference task by producing logical rules of classification. 

Several methods [1,5,7] have been proposed to construct decision trees. These 
algorithms have as inputs the training set composed by instances where each one 
is described by the set of attribute values and its assigned class. The output is 
a decision tree ensuring the classification of new instances. 

A major problem faced in the standard decision tree algorithms results from 
the uncertainty encountered in the data. This uncertainty can appear either in 
the construction or in the classification phase. Ignoring it can affect the efficiency 
of the obtained results. 

In order to overcome this drawback, probabilistic decision trees have been 
developed by Quinlan [6]. This kind of trees presents small extensions over the 
standard one and its use remains limited since it only deals with statistical 
uncertainty induced by information arisen from random behavior. 

The objective of this paper is to develop what we call a belief decision tree, 
a classification method adapting the decision tree approach to uncertain data, 
where the uncertainty is represented by belief functions as defined in the Trans- 
ferable Belief Model (TBM). The choice of the TBM seems appropriate as it 
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provides a convenient framework [2] for dealing with limited and uncertain in- 
formation, notably those given by experts. 

This paper is organized as follows: section2 provides a brief description of 
standard decision tree algorithms. In sections, the basics of the belief function 
theory are recalled. Our approach regarding a belief decision tree is described 
in sectiond. Both the construction and classification procedures will be detailed. 
Finally, an example explaining these two procedures is proposed in sections. 



2 Basics of Decision Tree Algorithms 

Several algorithms have been developed for learning decision trees [1,5,7]. In 
the artificial intelligence community, the most used is based on the TDIDT^ 
approach. In that approach, the tree is constructed by employing a recursive 
divide and conquer strategy. Its steps can be defined as follows: 

— By using an attribute selection measure, an attribute will be chosen in order 
to partition the training set in an ’’optimal” manner. 

~ Based on a partitioning strategy, the current training set will be divided into 
training subsets by taking into account the values of the selected attribute. 

— When the stopping criterion is satisfied, the training subset will be declared 
as a leaf. 

In the literature many attribute selection measures are proposed in [3,5,7]. 
Among the most used, we mention the information gain used within the IDS 
algorithm [5] . The information gain of an attribute A relative to a set of objects 
S measures the effectiveness of A in classifying the training data. It is defined as 
follows: 



Gain{S, A) = Info{S) — InfoA{S) where 



Info{S) = -Y.l=lPr-log2Pr and InfOA(S) = Ev&Domain{A) 

where pi is the proportion of objects in S belonging to the class (i = l..n) 
and is the subset of objects for which the attribute A has the value v. 

Although, it has shown good results, this measure has a serious limitation. 
It favors attributes with large number of values over those with few number of 
values [7]. To overcome this shortcoming, Quinlan [5,7] suggests another selection 
attribute measure called the gain ratio and defined by: 

Gain ratio(5. A) = sS^inMA) "^^ere 

Split Info{A) = T.v(iDoma^n(A) 

^ Top-Down Induction of Decision Tree 
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Split Info(A), measures the information content of the attribute A itself [5]. The 
gain ratio is the information gain calibrated by Split Info. Note that when the 
ratio is not defined, this criterion selects attributes among those with an average 
or better information gain [5] . 

Once constructed, the decision tree is used to classify new objects. For a new 
instance, we start with the root, we evaluate the relative test attribute and we 
take the branch corresponding to the test’s outcome. This process is repeated 
until a leaf is encountered. The new object belongs to the class labeling the leaf. 

3 Belief Function Theory 

In this section, we briefly review the main concepts underlying the theory of 
belief functions [8,10,11]. 



3.1 Definitions 

Let 6> be a finite set of elementary events called frame of discernment. The basic 
belief assignment (bba) is a function m: 2® — > [0, 1] such that YliAde fn{A) = 1. 

The value m(A) represents the part of belief supporting exactly that the 
actual event belongs to A and nothing more specific. The subsets A in 0 such 
that m(A) > 0 are called focal elements. 

Associated with m is the belief function [10] defined for A C 0 as: bel{A) = 
The degree of belief bel(A) given to a subset A of the frame 0 
is defined as the sum of all the masses given to subsets that support A. 

The representation of total ignorance is nicely achieved in the belief function 
theory. It is represented by the so-called vacuous belief function [8], i.e., the 
belief function which bba satisfies m(0) = 1 and m(A) = 0 for all A yf 0. 

3.2 Rules of Combination 

Let mi and m 2 be two basic belief assignments induced from two distinct pieces 
of evidence. These bbas can be combined either conjunctively or disjunctively. 

1. The Conjunctive Rule: When we know that both sources of information are 
fully reliable then the bba representing the combined evidence satisfies [12]: 

(mi Am 2 )(A) = Es,cce:BnC=A for A C 0 

2. The Disjunctive Rule: When we only know that at least one of the sources 
of information is reliable but we do not know which is reliable, then the bba 
representing the combined evidence satisfies [12]: 

(mi Vm 2 )(A) = Es.cce:BuC=yi"^i(^)-"i2(0) for AC 0 
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3.3 Vacuous Extension of Belief Functions 

Let X and Y be two sets of variables such that Y C X. Let be a bba defined 
on the domain 0y of Y. The extension of to Ox, denoted means that 

the information in m^ is extended to a larger frame X [4] : 

X &x-y) = mX {A) for A C Oy 

= 0 if B is not in the form A x Ox-y 

3.4 Pignistic Transformation 

The decision making problem is solved in the TBM framework by using the 
pignistic probability function defined and fully explianed by [10]; 

BetP{6) = AC 0 .ee A |A|.(i-m(0))’ all 0 € 0 

It is the only transformation between belief functions and probability functions 
that satisfies some natural rationality requirements. The major one is described 
as follows: Suppose two contexts Ci and 02, suppose your beliefs in context Ci 
is represented by and that the choice of the context obeys to some random 
process, with P{C{) = p and P(02) = q with p + q = 1. Let P denotes the 
operator that transforms a bba into a probability function. We want that it 
satisfies: 

P{p mi + q m 2 ) = p P{mi) + 90 ( 7712 ). 

This translates the property that transforming the belief held before knowing the 
context that will be selected is the same as combining the conditional probability 
functions one would have obtained if the context had been known. Full details can 
be found in [10]. The probability function so obtained is then used to compute 
the expected utilities needed for optimal decision making. 

4 Belief Decision Tree 

In this section, we define the structure of the decision tree within the belief 
function framework, called belief decision tree then we present the notations 
that will be used in this paper. Next, we develop the two major procedures of a 
decision tree: the construction and the classification procedures. 



4.1 Decision Tree Structure in the Belief Function Context 

Any decision tree is constructed from a training set of objects based on successive 
refinements. Due to the uncertainty, the structure of the training set may be 
different from the traditional one. In fact, we assume that the uncertainty is 
lying only on classes of training instances. That is, our training set is composed 
by objects where the value of each attribute is known with certainty, whereas 
there is some uncertainty regarding its corresponding class. 
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We propose to associate for each training instance Ij, j = l..p, a bba, denoted 
defined on the set of the possible classes 0 to which the object Ij can 
belong, and representing the beliefs given by an expert (or several experts) on the 
actual class of the object Ij. This representation is also appropriate to describe 
the classical case where the object’s class is exactly known. 

Once the structure of the training set is defined, our belief decision tree is 
composed by the same elements as in the traditional tree. However, due to the 
uncertainty in training instances’ classes, the structure of the leaves will change. 
Instead of assigning a unique class to each leaf, it will be labeled by a bba 
expressing a belief about the actual class of the objects belonging to the leaf. 

4.2 Notations and Assumptions 

In this paper, we use the following notations: 

- S: a given set of objects, 

- lj\ an instance (object, case, example), 

- A = {Al, A 2 ...Afe}: a set of k attributes, 

- D(Ai): the domain of the attribute G A, 

- A(Ij): the value of the attribute A for the object Ij, 

- = {Ij : A{Ij) = r:}: the subset of objects which value for attribute A G 
A is V G D(Ai) 

- 0 = {Ci,02, ■■■,Cn}' the frame of discernment involving the possible classes 
related to the classification problem. 

- C(Ij): the actual class of the object Ij, 

- [Ij}[A\{C) denotes the conditional bba given to C C 0 relative to object 
Ij given by an agent g that accepts that A is true. Useless indices are omitted. 

4.3 Procedure for Constructing a Belief Decision Tree 

As mentioned the algorithm to construct a decision tree, also called the induction 
task, is based on three major parameters: the attribute selection measure, the 
partitioning strategy, the stopping criterion. These parameters must take into 
account the uncertainty encountered in the training set. 



Attribute Selection Measure. Our attribute selection measure has to take 
into account the bba of each object in the training set. The idea is to adapt the 
gain ratio proposed by Quinlan [7] to this uncertain context. 

In order to define the gain ratio measure of an attribute A over a set of 
objects S within the TBM framework, we propose the following steps: 

1. For each object Ij in S, we have a bba m^{Ij} that represents our belief about 
the value of C(Ij). Suppose we select randomly and with equi-probability one 
object in S. What can be said about m^jS”}, the bba concerning the actual 
class of that object selected in S? 
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m®{S'} is the average of the bbas taken over the objects in the subset S: 



m^{S}{C) 



1^1 



for C C 0 



( 1 ) 



2. Apply the pignistic transformation to to get the average probability 

BetP^IS”} on each singular class of this randomly selected instance. 

3. Perform the same computation for each subset we get BetP^I^^} for v 
€ D(A), A e A. 

4. Compute Info(S) and InfoA(S) as done initially by Quinlan, but using the 
pignistic probabilities. We get: 



Info{S) = ~Y^ BetP^{S]{Ci).log 2 BetP^{S}{Ci) (2) 

InfoA(S) = E^gI5(A) %rInfo{S^) 

I cAl " 

= - E (3) 

v£D(A) ' ' i=l 

Once computed, we get the information gain provided by the attribute A in 
the set of objects S such that: 



Gain{S, A) = Info{S) — InfoA{S) (4) 

5. Using the Split Info, compute the gain ratio relative to each attribute A: 



G,.mR.tio(S,A) = T|d|A (5) 

In each decision node, the attribute having the highest gain ratio will be selected 
as the root of the corresponding decision tree. 



Partitioning Strategy. For the selected attribute, assign a branch correspond- 
ing to each attribute value. Thus, we get several training subsets where each one 
is relative to one branch and regrouping objects having the same attribute value. 



Stopping Criterion. It allows to stop the development of a path and to declare 
the treated training subset as a leaf. Three strategies are proposed: 

1. There is no more attribute to test. 

2. The treated training subset contains only one object. 

3. The values of the gain ratio relative to the remaining attributes are equal or 
less than zero. 

Once the stopping criterion is fulfilled, the current node is declared as a leaf 
characterized by a bba defined on 0. The leaf’s bba is equal to the average bba 
taken over the objects belonging to the same leaf. 
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Constructing Algorithm. Our algorithm presents an extension of the IDS 
algorithm to the uncertain context. It is composed by the following steps: 

1. Create the root node of the decision tree including all the objects of the 
training set T. 

2. Verify if this node satisfies or not the stopping criterion. If it is fulfilled, 
declare it as a leaf node and compute its corresponding bba. 

3. Otherwise, look for the attribute having the highest gain ratio. This attribute 
will be designed as the root of the tree related to the whole training set T. 

4. Divide the training set according to the partitioning strategy. 

5. Create a root node relative to each training subset. 

6. For each node created, repeat the same process from the step 2. 

If the bbas over the classes for every instance in the training set are described 
by a certain bba, i.e., there is no uncertainty about the actual class for all the 
objects in the training set, then we get the same results as the IDS algorithm of 
Quinlan [7] based on the gain ratio. 

4.4 Procedure of Classifying New Instances 

Once constructed, the belief decision tree will be used to ensure the classification 
of new instances in this uncertain framework. These instances may present some 
uncertainty regarding the value of one (or several) of its attributes. In fact, the 
uncertainty related to each attribute can be defined by a bba m'^* on the 
set 0Ai of all the possible values of the attribute. For those, where the value 
is known with certainty, it would correspond a certain bba having as a focal 
element only this value. Besides, if an attribute value is unknown, it would be 
expressed by a vacuous bba. 

We have to find the bba expressing beliefs characterizing the different at- 
tributes’ values of the new instance to classify. To ensure this objective, we have 
to apply the following steps: 

1. Extend the different bbas m'^* to the global frame of attributes 0a- 

2. Combine the extended bbas i^y applying the conjunctive rule: 

m®^ represents beliefs on the combinations of the attributes of the given 
instance. We then consider individually the focal elements of this bba . Let x be 
such a focal element. The next phase is to compute the belief functions bel®[x]. 

1 . If the treated focal element x is a singleton (only one value for each attribute) , 
then bel® [x] is equal to the average belief function corresponding to the leaf 
to which this focal element is attached. 

2. If the focal element x is not a singleton (some attributes have more than 
one value), then we have to explore all the possible paths relative to this 
combination of values. Two cases are possible: 

— If these paths lead to one leaf, then bel® [x] is equal to this leaf’s bel. 
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— If these paths lead to distinct leaves, then bel® [x] is equal to the result 
of the combination of each leaf’s bel by applying the disjunctive rule. 

Finally the belief functions computed with each focal element x are aver- 
aged [9] using the m®^ : 

bel^[m^^]{9) = ^ {x).bel^[x]{9) for 6 G 0 (7) 

xGOa 

Note that we have to apply the pignistic transformation in order to take a 
decision on the class of the instance to classify. 

5 Example 

Let’s illustrate our method by a simple example. Assume that a bank wants to 
develop a loan policy for its clients by taking into account a number of their 
attributes. Let T be a training set (see Table 1) composed of eight instances 
(clients) characterized by three symbolic attributes: - Income with possible val- 
ues {no Jow, average, high}, 

- Property with possible values {less, greater} that is to express if the prop- 
erty’s value is less or greater than the loan expected by the client, 

- Unpaid-credit (denoted by Unp-c) with possible values {yes, no} in order 
to know if the client has another credit unpaid or not. 

Three classes may be assigned to clients {0 = {Ci, C 2 , Ca}): Ci for whom 
the bank accepts to give the whole loan, C 2 for whom the bank accepts to give 
a part of the loan and C 3 for whom the bank refuses to give the loan. 



Table 1. The training set T 



Income 


Property Unp-c 


Glass 


High 


Greater 


Yes 


m^{h}{Ci) = 0.7; m«{7i}(6)) = 0.3; 


Average 


Less 


No 


m^{h}{C 2 ) = 0.5; m®{/ 2 }(Ci U C 2 ) = 0.4; m®{/ 2 }(e) = 0.1 


High 


Greater 


Yes 


m^{h}{Ci) = 0.8; m'^{h}{0) = 0.2; 


Average 


Greater 


Yes 


m®{/4}(C2) = 0.5; m®{/4}(C3) = 0.2; {h}{0) = 0.3 


Low 


Less 


Yes 


"i®{75}(C3) = 0.8;m®{/5}(C2UC'3) = 0.1;m®{/5}(e) = 0.1 


No 


Less 


Yes 


m^{h}{C3) = 1; m®{/6}(0) = 0 


High 


Greater 


No 


m^{h}{Ci) = l;m®{/7}(e) =0 


Average 


Less 


Yes 


m®{/8}(C3) = 0.6;m®{/8}(e) = 0.4 



Contrary to the ’traditional’ training set where it includes only instances 
which classes are known with certainty, this given training set T is characterized 
by uncertainty relative to some instances ’classes and which is represented by 
bbas. The training set T offers a more generalized framework than the tradional 
one. Thanks to our belief decision tree algorithm, we are able to generate the 
corresponding tree by taking into account this uncertainty. 
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Construction Procedure. Let’s now try to construct the induced belief de- 
cision tree relative to the training set T. The first step is to find the root of 
the decision tree. Hence, we have to compute the gain ratio relative to the three 
attributes by taking into account the uncertainty embedded in instances’ classes. 

Let’s illustrate briefly the computation of the gain ratio relative to the prop- 
erty attribute. Let m®{T} be the average bba relative to T, 
and rn^ be the average bbas relative to the sets of objects in T hav- 

ing as a value of the property attribute respectively greater and less. These 
bbas are computed by using the equation ( 1 ), then their corresponding pignistic 
probabilities BetP^{T}, BetP®{Tg™„(g’J,**'} and BetP®{T;g™^®’’*^} have to be 
calculated. 

Once computed, we get Info(T) = 1.535; InfOproperty = 1-17 and Split Info 
(property) = 1. So Gain ratio(T, property) = 0.365; By applying the same pro- 
cess, we get Gain ratio(T, income) = 0.405; Gain ratio(T, unpaid-credit) = 0.214 

The gain ratio criterion favors the income attribute since it presents the 
highest value. Thus, it will be chosen as the root of the decision tree and branches 
are created for each of its possible values (high, average, low, no). 

The same steps of the algorithm will be applied recursively. The belief deci- 
sion tree induced is represented by Fig. 1: 



Income 




Unpaid-credit m°{h) 




m®(/i3) m®(/7) 



mP{Ip} Unpaid-credit 




Property 




mP{h) 



m^{h) m®(/8) 

Fig. 1. The Final Belief Decision Tree 

Note that the leaf labeled by m®{/i 3 } is the average bba of the set involving 
the objects Ii and I 3 defined as: m^l/isKCi) = 0.75; m®{/i 3 }( 6 )) = 0.25; 



Classification Procedure. Once the belief decision tree relative to the training 
set T is constructed (see Fig. 1), suppose that we would classify an instance char- 
acterized by certain and exact values for its income and unpaid-credit attributes 
which are respectively the values average and yes. However, there is some uncer- 
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tainty in the value of the property attribute defined by: (greater) = 0.4; 

^property = 0.3; {Oproperty) = 0.3; 

Once the attributes’ bba are extended to 0a {0a = 0income x 0property x 
0 unpaid- credit), we apply the Conjunctive rule. We get a joint bba on 

singular or subsets of instances such that: {{{average, greater, yes)}) = 

0.4; m^^{{{average, less, yes)}) = 0.3; m^^{{average} x 0property x 

{yes}) = 0.3; 

Next, we have to find beliefs on classes (defined on 0) given the values 
of the attributes characterizing the new instance to classify. Three belief func- 
tions have to be defined where for each one, we take into account one focal 
element of . According to the induced belief decision tree (see Fig. 1), we 
get: heP [{{aver age, greater, yes)}] = beU; beP[{{average, less, yes)}] = bels', 
beP[{average} x 0property x {yes}] = beU V bels- 

Hence, these belief functions will be averaged then computing its correspond- 
ing BetP. As a result, we obtain that the new instance to classify has respectively 

0. 14, 0.38 and 0.48 as probability to belong to the classes Ci, C 2 and C 3 . So, it 
seems most probable to refuse the loan expected by this client. 

As we note, our classification method using the induced belief decision tree 
is able to ensure the classification of new instances characterized by certain at- 
tribute values (like in the case of the standard decision tree). It has also the 
advantage (over the standard tree) to classify instances characterized by uncer- 
tain attribute values. 

6 Conclusion 

In this paper, we have developed a classification method providing a formal way 
to handle uncertainty in decision trees within the belief function framework. In 
fact, the construction procedure of the belief decision tree is ensured by tak- 
ing into account the uncertainty about the actual classes of training objects. 
Then, we have proposed a classification procedure allowing to classify objects 
characterized by uncertain attributes. This method ensures the classification of 
instances with certain attributes or even those presenting some missing attribute 
values. 

The major interest of the proposed method is that it can be applied to 
training sets where the instance classes are uncertain. Belief function theory 
offers a perfect representation of any form of uncertainty, from total knowledge 
to total ignorance, in particular more flexible than what probability theory can 
achieve. The most obvious case where belief decision trees will show their power 
is encountered where the instance classes are only known to belong to some 
subsets of the class domain. 
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Abstract. Control of Systems in different real world fields such us 
Chemistry, Medicine, Robotics, etc. has been tackled for decades with 
approaches developed in the classical Control Systems field. In this pa- 
per, we will propose a Real Time controller relying on a Knowledge Based 
System with a Temporal Multi-valued Language. The proposed controller 
deals with the typical features of a Real Time framework such as impreci- 
sion produced by the sensors, smart and continue changes of the physical 
variables and it sends the required control signals in bounded time. This 
is a major restriction that must be fulfilled by a real time controller. 

Keywords: Automated Reasoning, Computational Complexity, Real 
Time, Temporal Reasoning, Many-valued Logic, Control Systems. 



1 Introduction 

The interesting features shown by the Expert Systems together with the mature 
theoretical basis existing currently to model and process uncertain and temporal 
information, were the original technical points for envisaging new applications 
inside the field of Control Systems by a part of the research community in Ar- 
tificial Intelligence. 

The mentioned new applications have in common their strong time restric- 
tions in supplying actions in real time. In other words, to the existing difficulties 
of representing uncertain and temporal knowledge, a technical step ahead begun 
to be considered aiming at developing Artificial Intelligence techniques to cope 
with time restrictions as well. 

Thus, new theoretical material and schemes were developed progressively till 
such Artificial Intelligence techniques turned out to be an important alternative 
to perform Real Time control of complex processes. 

One of the challenges of these new techniques is to represent and to process 
inexact information, more specifically, imprecise, fuzzy, incomplete and noisy in- 
formation. Thus, several languages have been proposed and deeply analysed to 
model such types of inexact information. Here we propose a Multi-valued Logic 
that, as we will argue later, allows to model imprecise and incomplete information 
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both being necessary to tackle in Automatic Control Systems. Indeed, impreci- 
sion is inevitably introduced by sensors while incomplete information can be due 
to the bad functioning of some hardware or software elements in an input chain of 
a source information. Currently, many theoretical results on Multi-valued Logic 
have been found and real applications have been developed [10]. 

Another aspect in the representation and process of real time signals is their 
continuous changes. In order to deal with this temporal dimension, we extend 
the multi-valued language to a Temporal Multi-valued logic. Thus, the proposed 
KBS controler is modeled by a set of Multi-valued Temporal Propositional Rules. 

The controler inputs are the outputs of the physic system and the control 
signals are the outputs of the KBS controller. The control strategy is established 
by a human expert or it is generated by a program computer taking into account 
the difference between the desired and the current state of the process. The 
control signals must be sent to the physical system within a bound time after 
the reception of some variation in an input signal meaning that the physic process 
has changed of state. This property is called Reactivity. 

Thus, we shall analyse the involved algorithms in the proposed KBS inter- 
preter and we shall show that the final complexity is in 0(1). 

Summarising we propose in this paper a KBS controller based on 1) a Propo- 
sitional Temporal ^ Many-valued Logic to account for state changes and certain 
kind of uncertainty; 2) an acyclic representation of the Rule Base of the KBS, 
namely logical literals can be ranked in levels; 3) a bottom-up interpreter algo- 
rithm with 0(1) on-line complexity; and 4) some methods to automatic valida- 
tion of the KBS. 

The plan of the article is the following. After the related work, we give an 
example to show that applications fit the hypothesis done in the design of the 
proposed KBS controller. Section 3 introduces the syntax and semantics of our 
Knowledge Based System. Section 4 discusses knowledge representation issues. 
Afterwards, we describe the Logical Calculi. Section 6 describes the main char- 
acteristics of our KBS and its associated interpreter. Finally, section 8 analyses 
the complexity issues. 



2 Related Work 

In this section, we briefly point out some works connecting Artificial Intelli- 
gence techniques with Real Time control process. To our knowledge, there is not 
attempts to design KBS with our features mentioned in the previous section. 

In [16,17] is studied the connection between Real-Time processes and Artifi- 
cial Intelligence techniques. 

Reactivity is analysed for instance in [21,11]. In [12,18] necessary conditions 
that a KRBS interpreter must fulfill to posses the reactivity has been stated. 

In [15] some issues relating incomplete information, precision in the answer 
and time response are addressed. 

^ The proposed logic is propositional and hence, it is not aimed at tackling with the 
frame problem and other famous problems in Temporal Reasoning 
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The idea of anytime algorithm have turned out of remarkable relevance in 
designing algorithms supplying answers in any time of the processing [3] . 

Search space problems with bounded computational time and where the qual- 
ity of the heuristic is linked to the relative approximation of the answer to a 
global optimal are studied in [14,13]. 

Diagnosis and control in real time is a major field in Artificial Intelligence in 
Medicine, see for instance the special issue entitled “Computational Intelligent 
Diagnostic Systems” of [2] and more particularly, it has special relevance in the 
subfield intensive heart cure, see for instance the special issue entitled “KBS in 
cardiovascular medicine” of [1]. 

Bi-dimensional systems representing and reasoning with temporal and un- 
certainty information have appeared also in [9,7,6,19]. 

Finally, an alternative to classical control systems for complex systems is the 
so called Fuzzy control which is based also in KBSs. These systems operate by 
handling uncertainty but without temporal raisoning. For a survey of this topic 
and its applications see for instance [4,20]. 

3 A Real World Example 

The described problem is part of a project aiming to implement a Real Time 
KBS at the Clinical Hospital of Barcelona (Spain) to execute automatically 
the required reactive actions in a Pediatric Intensive Care Unit (PICU). The 
main difference between reactivity in PICU with respect to any other ICU is 
that temporal constraints are stronger due to the fragile nature of new borns. 
Physicians in a ICU must perform both classical diagnosis and reactive diagnosis. 
Indeed, one main feature of patients in ICU’s is that they are in critical states 
which could evolve dangerously to unstable states with dramatical outcomes. 

Before computer technologies were available, the evolution of critical pa- 
tients was sensored by means of analogical material and all patient bio-signals 
were under continuous supervision of medical personal. However, this continu- 
ous supervision provoke a natural tireness of the medical personal implying an 
unavoidable lost of the dynamical pursuit of the monitored signals during some 
moments. Unfortunately, within these imsupervised intervals dangerous patho- 
logical tendencies can appear and thus, the emergence state is detected with 
some delay. This delay could seriously aggravate the critical state of the patient 
and may render impossible to control the danger. 

Thus, automatising the monitoring, supervision, diagnostic and therapeutic 
steps will avoid such severe drawbacks existing in manual operations. 

The way of physicians reason in a PICU is compatible with the construction 
of the well known KBS applied in Medicine. 

4 Syntax and Semantics 

Let us start by the classical boolean logic to better grasp a straightforward 
extension called boolean multi-valued logic. This logic is then generalised to take 
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into account the temporal dimension. In Real Time Knowledge Based Systems 
we deal with Horn Theories. A set of facts models a state aird a set of implicatioir 
rules allow to deduce implicit iirformation associated to the states. 

4.1 Syntax 

Definition 1. Boolean Bi-valued Logic BBL. It has two possible semantic 
values {0,1} for each propositional variable. Classically, these literals are noted p 
and ~^p respectively. According to our Many-valued notation, they would be noted 
{l}:p and {0}:p respectively. 

Definition 2. Boolean Multi-valued Logic. It is a straightforward extension 
of the BBL, the only change relies on the cardinality of the set N of truth values 
that an interpretation can assign to a proposition p. Thus, a literal in BML is 
noted S:p and its complemented literal N/S:p. N:p is a tautology. 

Remark A large class of applications relies on this logic [10], but its suitability 
for Real Time Systems has not been considered yet. 

Definition 3. Temporal Multi-valued Logic. A TML literal is a triple 
(T,S,p), where S and p are as defined before and T is an interval [lb up]. If 
lb=up, the literal represents instantaneous information. 

Definition 4. Knowledge Based System. A KBS is composed by a set of 
facts and rules. Facts and literals in rules are TML literals. The antecedent of a 
rule is a conjunction of TML literals and its consequent is a TML literal. 

Example 1. An example of a rule expressed in our language would be: 

([t — \h, t\, (high),P) A {[t — Ih, t], (high), Q) — > ([t, t + 2h], (very — high),R) 

where t is the current instant and Ih stands for oire hour. The rule models a 
particular situation in which if P and Q are high for the last hour then R will 
be very high in the next two hours. 

4.2 Semantics 

Definition 5. Interpretation An interpretation is a map of the set of propo- 
sitions to the paires of time intervals St and value intervals Ss : 

I : P — > St X Ss 

Definition 6. Literal Satisfiability. An interpretation I satisfies a literal 
(T,S,p) if I{p) = {Tp, Sp) and T CTp and Sp C S. 

Definition 7. Rule Satisfiability. An interpretation satisfies a rule if it does 
not satisfy at least one literal antecedent or satisfies its literal consequent. 

Definition 8. Logical Consequence A literal is a Logical consequence of a 
KBS if all the interpretation that satisfy the KBS satisfy also the literal. 
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5 Representation Issues 

Reasoning without considering imprecision and/or missing information could 
lead to erroneous decisions and hence provoke considerable pains. Indeed, the 
controller could lost the control of the process. We discuss briefly below the 
sources of these inexact information as well as some few concepts related to it. 

The source of imprecision could be due to the analytical or digital sensors 
and the way experts reason in its expertise domain, because of the use of the 
so called linguistic labels appearing in the statements employed to explain the 
relationship among physical variables. 

5.1 Sensor Imprecision 

Imprecision is a feature inherent to sensors. The resolution of a sensor indicates 
the minimal variation of the signal perceptible by a particular sensor. Namely, 
a variation smaller than the resolution sensor has no effects in the output of the 
sensor. 

Unaccurate Information. The existence of a sensor resolution entails that 
the output value of the sensor, which is possibly the input to the information 
process, is not the real input value. More precisely, if SR is the sensor resolution 
value, then its output value is related to the real physic value by: 

val{physic) S [val{sensor) — SR,val{sensor) + 5i?] 

Digital Sensor. As we address here the control by means of a computer, 
the output sensor must be digital. If the computer processes data represented 
by at most n bits, then SR = 1/2". 

5.2 Imprecision Due to Linguistic Labels 

Some inputs to the system can be provided by an expert. For example, in 
medicine the physician can provide some informations relative to the clinical 
story of a patient, the dynamic of a certain illness, the relationship among par- 
ticular parameters (cardiac rhythm, blood pressure, body temperature, etc.) and 
so on. Thus, an expert can provide his knowledge stating for instance “if a vari- 
able is between high and very high then ...”. This is, firstly the expert qualifies 
the quantitative values of a variable mapping its real values to a limit set of 
the so-called Linguistic Labels (low, average, high, very-high,. . . ) and second, 
he refers to it by mentioning intervals, for instance [high very-high]. In such 
situation, one has: 

LB{val{var)) G [LBi LB 2 ] 

where LB(val(var)) stands for the linguistic label associated to val(var) and LBi 
and LB 2 are two linguistic labels, the first one associated to smaller values than 
those associated to the second one. 

Unaccurate information modeling. The effect of the Sensor resolution 
can be captured easely by a TML literal (T,S,p) doing: 

S = [val{sensor) — SR,val{sensor) + S'R] 
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5.3 Incomplete Information 

The controler sequentially reads the input signals, makes a certain reasoning 
and afterwards sends the output signals. A sensor breakdown or a connection 
problem during a certain time can cause that a particular physic variable was not 
measured when the automatic controler needed it. Similarly, the human expert 
for some reason can omit or unknow certain information at one determined 
moment and then do not supply it to the automatic controler. These situations 
can be covered by a literal (T, Dom(p),p) where Dom(p) is the whole domain of 
possible values of the unknown variable. 

6 Logical Calculi 

The Inferences rules are two: One corresponding to a direct extension of the 
Modus Ponens to our Logic and another refering to the intervals of values and 
time. 

Intervals Rule (IR) 



(T, S,p),TDT',S' 2 S 



Temporal Multivalued Modus Ponens (TMMP) 

We note L = (T,S,p) and I{L) = {T',S',p), with T A T',S' A S. Hence, 
L h I{L). 

L,,...,Lk,I{L,)A...Al{Lk)^{U,V,q) 

{U,V,q) ^ ^ 

Theorem 1. Soundness Let us note Ihe logical calculi formed by 

the IR and TMMP rules. \~{ir^tmmp} is a sound calculi. 

Theorem 2. Completeness If KBS ^ L then KBS ^{ir^tmmp} L 

7 KBS Design Issues 

Notation We will distinguish between input Variables I, output variables O and 
state variables S. 

In control Systems the dynamic of systems is mathematically well established 
by: 



S{t) = fim,S{t - l),S{t -2),...,S{t- k)) 



0{t)=g{I{t),S{t)) 



The control in real time imposes that the computation of g{I{t)., S(t)) may 
take at most 0(1) time to fulfill the Reactivity property. 
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Each output control variable o is associated with a set of rules whose conse- 
quent is a literal of kind o). 

Each input signal i coming from a sensor is associated with literals appearing 
in rule antecedents of kind 

Other literals are defined by the expert who needs to declare intermediate 
deduction. The set of intermediate deduced variables are the consequent literals 
of a set of rules. The different values of these variables taken in different in- 
stants conform the state of the controler. Intuitively, this state is related to the 
diagnostic of the external process. Thus, according to the combination of these 
values, the control signals O are defined to maintain the system continuously 
under control. 

In order to design the Knowledge Based System the expert must have clear 
in mind: 

The domain of each input variable, the set of different diagnostic of the 
physical state or similarly, a proper classification of physic states and the action 
to be pursued in each diagnosed state. Experience with Expert Systems, Control 
Fuzzy and other KBS technologies have proved that for many cognitive fields 
there exists a Rule Base System whose interpretation by an algorithm in an 
on-line process matches the suitable properties of a efficient control. 

Although the design of KBS has be done till now in an ad-hoc way, recently, 
see for instance [8], some attempts have been proposed to design a KBS in 
a systematic way. The aim of this methodology is to bridge the gap between a 
specification of a problem in natural language and a specification in a declarative 
programming language. 

In this article, we are proposing a language and the required algorithms that 
enable the existence of such KBS in order to have an adequate controler. As 
mentioned, KBS have arisen since several decades as a serious alternative to 
control when the system to be controled is very difficult to be modeled. 

We consider three types of rules: 

— Input rules: all their antecedent literals are associated to input sensors. 

— Output rules: its consequent literal is associated to an output control signal. 

— Deduetion rules: they are not output rules, i.e. the consequent represents 

deduced information used by the expert. 

The KBS is associated to a graph AND/OR as usual. Each node corresponds 
to a propositional variable. Each rule to an AND connector. The set of connectors 
whose literal consequent contains the same propositional variable defines an 
OR connector. The literals that do not have descendent connectors are input 
literals. The literals that do not have ascendent connector are associated to 
output control signals. Nodes having ascendent and descendent are intermediate 
deduced variables that corresponds to state variables S. As mentioned, they 
help the expert to make a diagnosis of a part of the physic system and also they 
provide a preliminary information required by the expert before stating with 
confidence the diagnostis and its corresponding control signals. 

Human experts do not associate to each infinite combination of the input 
variable a different diagnostic and hence a different control. What an expert 
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does it is to qualify the values of variables with infinite or big number of possible 
values and classify the values in a finite set (between three and ten) of values. 
Each set is refered to by a linguistic label. 

Intuitively, this is because a part of the hyper-space of values corresponds 
to a same qualitative situation. Namely, although the state of the system is not 
exactly the same for two different values of the same region, the variation is not 
significant enough to change the qualitative diagnostic. 

The values of the state variable are also qualified and then these values take 
a limit number of variables that usually goes from three 

< Yes, unknow, Not > 



to for example ten 



< very — low, low , ..., very — high > 

To each value of a qualified variable corresponds a virtual range of values in a 
numerical reference. 

To take into account the variations of the signal with respect to time, a 
propositional variable v{t) is stored as an array: 

t => v{t) : v{t — 1) : . . . : v{t — k) 

In order to keep a record of these values in the next instant is enough to 
execute a simple shift of the set of consecutive memory cells and to add the new 
value v{t + 1): 



t + 1 => v{t + 1) : v{t) v{t — k + 1) 

The principle of the algorithm consists in propagating the value of the an- 
tecedents of a rule to the consequent. This propagation begins by the input sig- 
nals and ends in the signal control accomplishing therefore a bottom-up strategy. 

8 Complexity Issues 

The forward algorithm is a straightforward extension of the algorithm [-5], de- 
signed for the propositional case, to our temporal multi-valued language. 

In our case, we assume that the connectors must be ranged in levels. Con- 
nectors of level 1 are those whose antecedents are all input variable. Connector 
of level k-|-l are those whose antecedents are at most of level k and at least there 
is a literal with level k. 

Linearity The linear complexity 0(n) of the algorithm in [.5] enable to design 
an interpreter with linear complexity: 

— Connectors are processed by levels and each connector is computed at most 



once. 
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~ To determine if a rule can be fired each antecedent must have a truth eval- 
uation in the current instant. To check this, one needs to match each literal 
(T,S,p) with the input or deduced correspondent literal {Td,Sd,p)- Thus 
(T,S,p) is true whenever T CTd and Sd Q S. 

Reactivity. Now, as the complexity of the algorithm is 0(n), by definition 
of big “O” , we know that the maximal time elapsed by the propagation process 
is k\.n+ k 2 , where k\ and k 2 are experimental constants. Knowing that, once 
the KBS is definitive and not modified anymore, n is fixed, the maximal time to 
compute the output signals is fixed and it can be determined experimentally. In 
other words, with n fixed, the complexity of the algorithm is 0(1) verifying in 
this way the reactivity property. 

Coarse Parallelism. The reactivity time can be reduced using parallelism. 
Thus, with a coarse granularity in a multiprocessor architecture of p processors, 
the reactivity time can be reduced to up K' < K and K' > K/p. 

Fine Parallelism. If we use a fine granularity architecture with one pro- 
cessor per each variable and interval and one per each connector, this is 0(n) 
processors, we can achieve a parallel complexity of 0{D.logR), where D is the 
number of levels in the KBS and R is the maximal cardinality of the antecedents 
of the rules. Hence, K” < K' . 

9 Conclusion 

In this article we have proposed a language and an interpreter to perform con- 
trol of physic systems in real time. This method relies on the Knowledge Based 
System paradigm. The language underlying our approach is a temporal multi- 
valued language which cope with two major issues in representing knowledge: 
on the one hand, imprecise and incomplete information, and on the other hand, 
temporal information. 

Thus our proposed method is an alternative to develop control in real time 
when the physic process is difficult to model, escaping in this way to the math- 
ematical line attached to the Control Systems field. 

Many advantages of our KBS devise have been pointed out and the strict 
restrictions in computing the output signal demanded by a real time control 
loop has been showed to be fulfilled. 
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Abstract. A number of formalizations of contexts has been proposed 
since the seminal paper of J. McCarthy on contexts [8]. Despite consid- 
erable research efforts over the last two decades aimed at formalizing 
contexts, there are still a number of important aspects from a formal 
point of view that have not been sufficiently studied. This paper is a part 
of a more general study of several classes of multicontext systems aimed 
at characterizing them with respect to their languages. The present pa- 
per addresses the simplest multicontext systems, with propositional lan- 
guages. Hilbert-style syntax is introduced as well as a context version of 
modal Kripke semantics. Correctness and completeness of the proposi- 
tional multicontext systems are proved as well as their decidability. 



1 Introduction 

In many different domains the notion of context plays an important role. In- 
tuitively, the notion of context is used to capture the meaning of all relevant 
factors in the environment that can affect an agents’s behavior and help him to 
reduce the number of unexpected situations. In AI, contexts are used to overcome 
the problems resulting from huge knowledge bases (KBs) and also to tackle the 
problems related to ’’generality” [8]. The alternative suggested by the contextual 
approach is based on structuring the original knowledge base into a subset of 
smaller and easier manageable units. An essential aspect in such a partitioning 
is that the knowledge in each subset is assumed to be grouped based on given 
features, e.g. they might be grouped based on the problem being solved or based 
on a particular sub-domain. The next important aspect related to the partition- 
ing of the global KB is to provide ’’local reasoning” in each unit or context. This 
property would enable us to localize the search for a solution within a particular 
context rather than searching the global knowledge base. However, partitioning 
the global KB into a subset of units does not solve the problem associated with 
the complexity of knowledge manipulation. The units of the global KB must be 
provided with ” channels of communication” so that some facts derived in one of 
the units may be made accessible to the other units. It is important also that 
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the interactions between contexts be based on reliable (justified) information. 
In a way, a system of contexts can be viewed as a set of ’’local” contexts con- 
nected with channels of communication into a network of contexts, i.e. into a 
multicontext system. 

A number of formalizations of contexts have been proposed [1,5, 6, 7] since the 
seminal paper of J. McCarthy on contexts [8]. One possible approach of rep- 
resenting contexts is based on the notion of logical formal system. Each logical 
system can be viewed as a compound system with three components: a lan- 
guage, defining which sequences of symbols of a given alphabet are well formed 
formulas; a set of axioms asserting a collection of facts assumed to be true in the 
system; and a set of inference rules inferring new facts from already proven facts. 
If we associate each context from a given set of contexts with a particular logical 
formal system and introduce further new rules that enable us to infer new facts 
in a context from premises derived in some other contexts, then we arrive at the 
notion of a multicontext system. In the following considerations, the notion of 
context and the notion of logical formal system will be used as synonyms. 

Despite considerable research efforts over the last two decades aimed at formaliz- 
ing contexts, there are still a number of important aspects from a formal point of 
view that are not understood sufficiently. This paper is a part of a more general 
formal study of contexts. It addresses the simplest type of multicontext sys- 
tems, with propositional languages, namely, propositional contexts and studies 
their properties from a formal point of view. The paper starts with definitions 
of some basic notions, which enable us to give a formal definition of a multi- 
context system and derivability in it. In the second part we introduce a Hilbert 
style syntax as well as a context version of the modal Kripke semantics. In the 
following part correctness and completeness of the propositional multicontext 
systems, are proved as well as their decidability. 

2 Basic definitions 

Following [6] in this section we introduce some basic definitions related to the 
multi-context systems and derivability in them. 

Definition 1. A context Ci is a triple Ci = iLi, Ai, Af) , where Li is the language 
of Ci, Ai is the set of axioms of Ci and Ai is the set of inference rules. 

Each context describes the world from its specific point of view, based on its 
expressive and reasoning capabilities. 

Definition 2. A multicontext system (MCS) is defined as a pair {{ci}i^j,BR), 
where I is a set of indices, is a set of contexts and BR is a set o/ bridge 

rules. 

\i {p ^ Li, then c, : is a context formula, that is any context formula contains 

the name of its context as a label. Thus c, : (p denotes the formula (p and the 
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fact that is a formula of the context c,. In addition we extend the set of 
context formulas, by adding one more rule for constructing new formulas. If 
Cj : (^ is a context formula, then Cj : ist(ci,(p) is also a context formula for all 
j e I. Intuitively the formula cj : ist{ci,(p) is used to express the fact that cj is 
’’aware” that (p is true in c,. 

To make a distinction between the two types of inference rules, the inference 
rules that belong to Zi, are called internal inference rules or simply inference 
rules, while the inference rules that belong to BR are called bridge rules. The 
inference rules specify the ’’local deduction” in c,, while the bridge rules specify 
the interaction and constraints between contexts. In fact the bridge rules make 
a collection of contexts a MCS. In general the bridge rules are of the following 
types 



or 



Cl . CKi . . . Cfi . CK^ ^ 

c : a 



['^m+l ■ Pm+l] [c„ ■ Pn] 

Cl . CKi . . . C>fji . CK^ C^_|_l . CK^_|_1 . . . C^ . CK^ ^ ^ 

c : a 

The most popular type of bridge rules are the reflection up bridge rules and the 
reflection down bridge rules [6] . The following are instances of reflection up and 
reflection down bridge rules termed by us correspondingly ist-inferring bridge 
rules and ist- eliminating bridge rules. 

An ist-inferring bridge rule such as 



Cj : A 
^ 

Cj : ist{Cj,A) 

enable us to infer in a given context c, that in some other context Cj formula 
A is true based on the fact that this formula is proven to be true in the second 
context. 

An ist -eliminating- bridge rule 



Cj : ist(cj,A) 
Cj : A 



Eist 



enable us to infer in a given context cj that a given formula A is true, based on 
the fact that in some other context c, the formula ist{cj,A) has been derived. 

Notice, that in the previous example it is assumed that all contexts have the 
same language, i.e. each wff in a context c, is also wff in any other context cj. 

The following definition introduces the notion of derivability (entailment) in 
multicontext systems. Each proof is a tree structure, whose elements are context 
formulas obtained by the following rules 
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Definition 3. (Proof of a formula depending on a set of assumptions) 

(a) If a G Ai, then Ci : a, is a proof of Ci : a, depending on 0. 

(b) If a -< G Ai, then Ci : a, is a proof of Ci : a, depending on {c, : a}. 

(c) If Hk is a proof of Ck ■ Oik, depending on for eaeh k, sueh that 1 < 
k < n, then 

III .. . Pin 

L 

c : a 

is a proof of c : a, depending on P, where 

p= U Pk, 

l<k<n 

assuming that the applied rule i is of the type I, or 



r= U U (A\{c;:/34)) 

l<A;<m m<.k<n 

assuming that the applied rule i is of the type IP 

Definition 4. A eontext formula c : a is derivable in a given multieontext sys- 
tem from a set of eontext formulas P if there exists a proof of c : a depending 
on P. By P \- c : a we denote the faet that c : a is derivable from P. A formula 
c : a is a theorem in a given multieontext system if $ \- c : a, denoted also by 
\- c : a. 



One fundamental difference between our approach to formalizing contexts [2,3] 
and the other authors such as the work done by Giunchiglia and his colleagues 
[5,6] is in the type of bridge rules that are used. While most of the MCS defined in 
[6] introduce two types of bridge rules - reflection down and reflection up bridge 
rules, in this paper we restrict our considerations to a class of multicontext 
systems based on a reflection up bridge rules of the type 

Cj : 0 

Cj : ist{ci, (f) 

termed by us ist-inferring bridge rules. 

One motivation for addressing multicontext systems limited to ist-inferring bridge 
rules is grounded on our observation that the multicontext systems extended 
with a reflection down (ist- eliminating) type bridge rule such as 

Cj : ist{ci, <j>) 

are in a sense more vulnerable to inconsistencies compared to the former class. In 
MCS provided with ist- eliminating (Eist) bridge rules any inconsistency deriv- 
able in one context is propagated to the other contexts of the system: 
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Ci-± 

c,:isi(cj,_L) 






Eist 



It turns out that it is possible to overcome this shortcoming by limiting our con- 
sideration to MCS with «t-inferring bridge rules. As a result of the elimination 
of the wt-eliminating bridge rules we achieve a locality of inconsistency, that is 
any inconsistency of a given context is its internal property and it can not be 
imported from outside. However, the consequences of such restrictions are not 
only beneficial. One problem with contexts limited to ist-inferring interactions 
is that it is possible from a given context c, to derive facts such as c, h ist{cj,(p), 
while (p is not derivable from cj . Thus it is possible a context c, to assert that a 
formula (p holds in some other context cj, while it does not. Such ist-assertions 
make c, incorrect for Cj. For detecting and studying contexts with such proper- 
ties we have introduced the notion of importing context [3] . Intuitively importing 
contexts are contexts where the ist-eliminating bridge rule is a derivable prop- 
erty. By elimination of the ist-eliminating bridge rules we arrive naturally to 
the possible world semantics which is demonstrated in the present paper for the 
propositional case of multicontext systems. 

In the following sections we study the simplest class of multicontext systems, 
where all contexts are defined in a propositional language. We introduce a 
Hilbert-style syntax as well as a context version of a modal Kripke semantics. 
The correctness and completeness of the inference system has been proved and 
has its decidability. 



3 Propositional multicontext systems 

3.1 Syntax 

Let VAR be a countable set of propositional variables and C be an at most 
countable set of context constants. 

Definition 5. 1. If c G C, then c : T and c : T are formulas. 

2. If p G VAR and c G C , then c : p is formula. 

3. If c'. A is formula, then c : ^A is also formula. 

f. If c : A and c : B are formulas, then c:A\/B,c:AAB,c:A^B are 
formulas 

5. If c : A is a formula and c' G C , then c' : ist(c, A) is also formula. 

The following is the set of axioms of the multicontext system, including: 

A context version of the classic propositional axioms: 



Axi : c : a ^ (P ^ a) 
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A.X 2 • c (ct — ^ /3) — ^ ((^ — ^ (/^ — ^ t)) — ^ ^ t)) 

Ax 3 : c a ^ {fi ^ a f\ [i) 

Axi : c\ a t\ (i ^ a 
Ax 3 : c: a f\ (3 ^ (3 
Axq : c a ^ a\J (3 
Ax^ : c : f3 ^ aV f3 

Axs ■ c : {a ^ j) ^ {{f3 j) ^ {aV f3 ^ 7 )) 
Axq c (ct — ^ /3) — ^ ((^ — ^ '/3^ — ^ 

Axio ■ c : -i-ia a 



The following is K axiom in its context version: 
c : ist(c ,a->-/3)—>- (ist(c , a) ist(c , /3)) 

And the axiom -iT: 
c : -<ist(c , T) 

In the following sections we examine two separate multicontext systems that 
result correspondingly from the inclusion and the exclusion of the axiom -iT. 

The following two expressions define the inference rules included in both multi- 
context systems. 

Modus ponens (MP): 

c : a c : a —> /3 
c : [3 



list brige rule: 

c : a 

c : ist(c, a) 

Thus we have defined a class of multicontext systems using Hilbert-style ax- 
ioms for its axiomatization. The following are some basic definitions related to 
derivability. 

Definition 6. Let X be a set of formulas and c : A he a formula. The formula 
c : A is said to be derivable from X if there exists a finite sequenee of formulas 
Cl : Ai , C 2 : A 2 , c„ : sueh that 

1. Cn = c and A„ = A, and 

2. eaeh element of the sequenee is either an axiom or element of X or is 
obtained from the previous elements by applying any of the two inferenee rules. 



The sequence of formulas ci : Ai,C 2 : A 2 , is called a proof of c : A, 

from X. The fact that c : A is derivable from X is denoted by JA h c : A. 

Definition 7. Any formula that is derivable from 0 is ealled a theorem. The 
faet that c : A is a theorem is denoted by \- c : A. 
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3.2 Semantics 

Definition 8. Let U ^$,RCCxUxU. Then (C,U,R) is ealled a Kripke 
eontext strueture. 



Definition 9. Any mapping v : C x VAR x U — > {0,1} is ealled a truth 
assignment funetion. 



Let {C, U, R) be a Kripke structure and u be a truth assignment function in 
it. The following is an extension of the truth assignment function to a truth 
assignment of any formula: 



v(c : 


1 ^ n 

E,x) = 0 










v(c : 


E,x) = 1 










v(c : 


-lA, x) = 1 




O 

II 






v(c : 


Ay B,x)^ 


= 1 


II 


1 or v(c : B,x) = 


1 


v(c : 


A A B,x) = 


= 1 


II 


1 and v(c : B,x) - 


= 1 


v(c : 


A ^ B,x) 


= 1 v{c : A, x) = 


- 0 or v(c : B,x) = 


= 1 


v(c : 


ist(c : A), 


x) : 


= 1 “I/ Vy G U{f 


c : x,y) G R ^ v(c 


: A, 


v(c : 


ist{c : A), 


x) : 


= 0 W G U{f 


c : x,y) G R k v{c 


: A, 



1 ) 

0) 



Definition 10. (C, U, R,v) is said to be Kripke eontext model. A formula c : A 
is true in this model, if for eaeh x £ U we have v(c : A,x) = 1. 



A formula c : A is true in a Kripke context structure (C, U, R), if it is true in 
any model (C, U, R, v) of this structure. 

If A' is a set of Kripke context structures, then the formula c : A is true in S if 
and only if it is true in each structure of E. 



3.3 Correctness and completeness 

Definition 11. By Eq we denote the set of all Kripke eontext struetures (C, U, R), 
sueh that \/c G C \/x G U 3y G U : (c,x,y) G R. A relation R and eontext 
strueture (C, U, R) with this property are said to he a serial relation and a serial 
eontext strueture respeetively. 

Theorem 1. (eorreetness) 

1. Eaeh theorem in a multieontext system without the axiom -iT is true in eaeh 
Kripke eontext strueture. 

2. Eaeh theorem in a multieontext system with the axiom -iT is true in eaeh 
Kripke eontext strueture that belongs to Eq, that is, in eaeh Kripke serial 
eontext strueture. 

Definition 12. A set of formulas X is ealled a theory if it satisfies the following 
eonditions: 
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1. If c. A is a theorem, then c : A G X 

2. If c : A £ X and c : A ^ B £ X, then c : B £ X (X is elosed with respeet 
to MB). 

Definition 13. A Theory X is a maximal theory, if: 

1. X is eonsistent. 

2. IfY is a eonsistent theory and X CY then X = Y. 

Lemma 1. (Lindenhaum) If X is a eonsistent theory and c : A is a formula, 
that does not belong to X, then there exists a theory Y : X CY and c : A ^ Y. 

Lemma 2. Let X be a maximal theory. Then 

1. c-.^A£ X ^ C-. A^ X 

2. c:AAB£X-^c:A£X and c : B £ X 

3. c : AV B £ X c : A £ X or c : B £ X 
f. C-. A^B£X^c-. A^X or C-. B £ X 



Corollary. If c : A is not a theorem, then there exists a maximal theory X: 
c: A^X. 

Notice that the following property that is true in propositional and predicate 
calculus and in modal logic as well is not true any more in the multi-context 
case. 

If X is a theory that does not contain given formula, then there exists a maximal 
theory containing X, but not containing the formula. 

Definition 14. Let X be a set of formulas. Then we denote Ist^^{c,X) = {c : 
A \ c : ist{c ,A) £ X} 

Lemma 3. If X is a theory, then Ist^^(c,X) is also a theory. 

Lemma 4. If X is a eonsistent theory, then Ist^^(c,X) is also a eonsistent 
theory. 

Definition 15. The Kripke eontext strueture (Ck,Uk, Rk) is said to be Kripke 
eanonie eontext strueture if Uk is the set of all maximal theories and for Rk 
holds 



x,y £Uk ■■ (c, x,y) £ Rk ^ 1st ^ (c, x) Cy 
Lemma 5. Let x £ Uk- Then for any formula c : ist{c ,A) holds: 

c : ist(c ,A) £ X (\/y £ Uk)((c, x,y) £ Rk ^ c : A £ y) 
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Definition 16. (Ck, Uk, Rk,Vk) is said to be canonic context model ifp € VAR, c € 
C and X G Uk, then 

Vk{c ■. p,x) = c ■. p £ X 

where Vk is extended to a truth assignement of any formula as described earlier. 

Lemma 6. Vk(c A, x) = 1 c A ^ x 

Theorem 2. (for the canonic model) 

c : A is a theorem c : A is true in the canonic model. 

Theorem 3. (completeness) c : A is true in So c : A is a theorem. 

4 Decidability 

Consider a multicontext structure with the axiom -i_L and finite number of con- 
texts. 

Definition 17. Let (C,U, R,v) be a context model and R is a set of formulas 
closed under the subformula relation (i.e. if c : A G R and c : B is a subformula 
of c : A, then c : B G R). Let x,y G U. We denote 

x = y%i (yc : A € R){v{c '. A,x) = v{c : A,y)) 

Lemma 7. The relation = is an equivalnce relation. 

Definition 18. The context model (C,U* , R* ,v*) is said to be a filter of the 
context model (C, U, R, v) with respect to R , a set of formulas closed under the 
subformula relation, if: 

1. U* = [U] 

2. \/x, y e U \/c e C 

a) If{c,x,y) e R, then (c, [z],[y]) e R* 

b) If {c, G R* then for any formula c : ist{c ,A) £ R it holds: If 

v(c : ist(c ,A),x) = 1, then v(c : A,y) = 1. 

3. v*{c : p, [a;]) = v(c : p, x) for all p £ VAR, c : p £ R and x £ U 

Theorem 4. Let {C,U* , R* ,v*) be a filter of the context model iC,U,R,v) with 

respect to R. Then for any formula c : A £ R and for all x £ U it holds: 

v(c : A, x) = V* (c : A, [z]) 

Lemma 8. If (C,U, R,v) is a context model and R is a set of formulas closed 
under the subformula relation, then there exists a filter of the model with respect 
to R. 

Lemma 9. Each filter of a serial context model is also a serial context model. 

Theorem 5. The multicontext system with the axiom -i_L has the finite model 
property, i.e. if c : A is not a theorem then there exists a finite model in which 
c : A is not true. 

Theorem 6. (decidability) The multicontext system with the axiom -i_L is de- 
cidable. 
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5 Conclusion 

The importance of formalizing contexts has motivated considerable research ef- 
forts over the last two decades. However the work done so far is not sufficient 
for understanding some important aspects of contexts from a formal point of 
view. This paper addresses a number of key logical properties of propositional 
multicontext systems. In fact the propositional contexts are an instance of a 
family of multicontext systems with respect to their languages. A similar study 
has been done for first order multicontext systems. In contrast to the proposi- 
tional contexts, there are still open questions regarding both the completeness 
and decidability of first order MCS. This two types of multicontext systems cover 
a class of comparativly simple multicontext systems with respect to their lan- 
guages. It is interesting to compare the results obtained for the latter MCS with 
a similar study of ’’mixed” multicontext systems, whose contexts are different 
logical formal systems. 
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Abstract. Consistency-based diagnosis is a main research area in 
Model-based diagnosis. Many approaches to consistency-based diagno- 
sis need to compute the set of conflicts to generate diagnosis candidates. 
Possible conflicts are introduced as an alternative to dependency-record- 
ing engines for conflict calculation. Given a qualitative representation of 
system description, then search for those subsystems capable to gener- 
ate predictions, and hence, capable to become conflicts. We define this 
concept for static systems, and later on we extend the definition to deal 
with continuous dynamic environments. Moreover, we explain how to do 
consitency-based diagnosis using possible conflicts. 



1 Introduction 

Consistency-based diagnosis uses only the description of the structure of a system 
and model of the intended behaviour of its constituents to localize malfunctioning 
components. Several techniques have been proposed to implement this theoreti- 
cally sound theory. Probably the GDE approach[8] has been the most successful 
one. Nevertheless, several drawbacks have been reported when it was applied to 
dynamic systems[4, 7]. Perhaps, the most important problem is the presence of 
feedback loops, which appears due to the use of dependency-recording engines 
in the GDE framework. 

Morevoer, in the field of continuous processes usually the number of sensors 
and its location is fixed in advance. Hence, candidate refinement stage in a GDE 
approach is more difficult. 

As an alternative to dependency-recording engines for diagnosis of continuous 
processes, in this paper we extend and formalize the possible conflict concept[13] 
which reduce computation effort of on-line diagnosis in those environments where 
the number of available measurement points is fixed beforehand. 

This work is organized as follows: first we introduce the concept of possible 
conflict for static systems. Second, we extend the concept to cope with dynamic 
systems, showing results in a case study. Later on we describe how to use that 
concept in classical consistency-based diagnosis. Finally, we compare our ap- 
proach to related work. 
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2 The Possible Conflict Concept 

In the GDE paradigm, diagnoses are obtained in an iterative cycle of behaviour 
prediction, conflict detection and candidate generation or refinement. A conflict 
is a set of correctness assumptions for system components that contradicts cur- 
rent observations. Moreover, the set of minimal conflicts characterizes the set 
of minimal diagnoses [8] . However, conflict calculation is not a trivial step and 
requires additional computation effort. In fact, GD E-like systems usually rely on 
dependency-recording engines to find them. 

To avoid problems related with these engines, we propose the following in- 
sight: not every node in the conflict lattice can be a conflict. There are topological 
and behavioral restrictions that limit subsystems capable to become conflicts, 
assuming that no bridge-faults are present. To find these subsystems we work 
with a qualitative representation of system description. 

From now on, model refers to a set of relations among variables describing 
a component or subsystem behaviour. And model evaluation denotes the search 
of values for one or more variables in a subsystem, given a model and a set of 
known variables, and using only local resolution techniques. 

In the qualitative representation we propose, only system variables and re- 
lations among them are considered. The term relation applies to any constraint 
among system variables (physical laws, expertise knowledge, or control algo- 
rithms) whatever the form they adopt (quantitative or qualitative, algebraic 
equation or tabular function, linear or not). In this way, system description may 
be represented as a hypergraph. 

2.1 System Description as a Hypergraph 

System description, SD, defines a hypergraph H = {V, R}: 

— V = {vi,V 2 , ■ ■ ■ ,Vn} are system variables. 

— V = OBS U NOBS. OBS is the set of observed, i.e. measured, variables, 
and NOBS is the set of non observed variables. 

— R = {ri,r 2 , ■ ■ ■ ,rm} is a family of subsets in V and it identifies the set of 
relations among system variables. 

To avoid the use of dependency-recording engines, we must localize those 
subsystems in SD able to become conflicts. But conflicts are linked to discrep- 
ancies between observed and predicted values or between two predicted values. 
Therefore, we must search for those subsystems able to predict one value for an 
observed variable or to predict two values for a non-observed variable, that is, 
subsystems which can be evaluated)!!]. We have called them evaluable chains. 

Definition 1. Given H, an evaluable chain is a partial subhypergraph H^c = 
{Vec.Rec}, Vec C V, and Rec C R, Verifying: 

1. Hec is connected. 

2. VecfiOBS 
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3. Mvno I Vno G (Kc H NOBS) D dn^^ivno) > 2. 

4-. Let X = Ve(f\NOBS be the set of unknown variables in Vec and let G{Hec) be 
the bipartite graph whose set of nodes eorresponds in one hand to the xGX 
and in the other hand to the ri^^ G Rec, where nodes are linked by an edge 
iff xGri^^. Then, G{Hf,c) has a matching of maximum cardinality m' = \X\ 
and \Rec \ >m' + 1. 

The second point states for the need of, at least, one measured variable to 
diagnose. The third point is a necessary condition for local propagation^, while 
the fourth point guarantees that the subsystem defined by iJgc has redundancy, 
which is a necessary condition to perform diagnosis. 

As a result of this definition, an evaluable chain represents a set of relations 
whose variables might be either measured or evaluated using adjacent relations. 

Definition 2. An evaluable chain, Hec, is minimal if no partial subhypergraph 
Hgc C i?ec is an evaluable chain. 

From now on, we will only consider minimal evaluable chains, since minimal 
diagnosis can be characterized from the set of minimal conflicts. 

2.2 How to Do Predictions from Evaluable Chains 

Hec represents a necessary condition for a subsystem to be evaluable. However 
this does not suffice. We must consider the different ways a relation Vi can be 
locally solved. This information is usually available or can be computed. And it 
must be introduced to figure out how evaluable chains can be evaluated. 

We create an AND-OR graph, the evaluable model, associated to each evalu- 
able chain. Each edge in the evaluable chain provides with one or more AND-OR 
arcs the AND-OR graph, representing the different ways variables can be locally 
propagated to evaluate the relation. An AND arc implies that every variable in 
the tail of the arc must be measured or previously estimated to get the value 
of the variable in the head. An OR arc represents the need of the value of any 
variable in the tail of the arc to get the value of the variable in the head. 

Different resolution methods, using only local propagation criterion, will pro- 
vide different, if any, evaluable models for each evaluable chain. 

We introduce two concepts needed to interpret the AND-OR graph: 

Definition 3. i is a leaf node iff R~^ = 0. 

Definition 4. i is a possible discrepancy node iff 

{dfj^{i) = 2 A iG NOBS) V (dfj^ii) = 1 A f G OBS). 

That is, a leaf node has no predecessor^. And, possible discrepancy nodes repre- 
sent variables which are either estimated twice because they are not measured, 
or estimated once because they are measured^. 

^ is the degree of node i. 

^ r~^. set of predecessors of node i. 

^ \ inward demi-degree of the node i in graph Gm- 
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Proposition 1. Let be Hm = {Vm,Rm} the AND-OR graph induced by local 
propagation criterion on Hgc, where: 

1- Vm = Kc 

2. Wn e Rec D GRm,k>l 

Then, ri € Rec induce a partition in Rm- 

fik = fe > 1 represents the k different ways can locally 

compute using {xi^ , . . . ,Xi,}. 

Proof. Each G Rec induces an equivalence class in Rm- Hence, by definition, 
it induces a partition in Rm- 

This is the first step towards the identification of evaluable models. 




Fig. 1. A classical example. Mi, M 2 and M 3 are multipliers. Ai and A 2 are 
adders. OBS = {A, B,C, D, E, F,G}. NOBS = {X,Y,Z} 

Definition 5. A partial AND-OR graph in Hm, Hem = {yem,Rem\ is an evalu- 
able model iff: 

1. Hem is connected. 

2. Rem is a minimal hitting-set for the partition induced by ri €Rec in Rm- 

3. (yxi I Xi G Vem and Xi is a leaf node) D XiGOBS . 

4-. 3iXi G Vem I Xi is a discrepancy node. 

The second point guarantees that every relation in the evaluable chain pro- 
vides only one AND-OR arc to the evaluable model. The third point states 
the need of measurements to start local propagation. Finally, the fourth point 
imposes that a unique node may be the origin of a discrepancy. 

We have analyzed the system in figure 1 looking for evaluable chains and 
evaluable models. To differentiate among components and relations in their mod- 
els, we use uppercase and lowercase letters respectively. If needed, indices will 
distinguish different relations in the same model. 

In upper left corner of figure 2, we can see its related hyper graph. In the upper 
right corner, we show the set of minimal evaluable chains obtained from the 
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hypergraph: {mi, m2, ai}, {m2, m3, 02}, {mi, oi, 02, m3}. Also, we represent all 
the possible ways of local propagation allowed for model of the adder behaviour 
in the lower left corner. Finally, selected evaluable models, in the lower right 
corner, are: 



F = X -h Y where X = A -h C, and Y = B -h D 
G = Z -h Y where Z = C -h E, and Y = B -h D 
Y = G- Z and Y = F- X where X = A -h G, and Z = G -h E 




C) Aiid-or arcs for the add relation. 



D) Selected Evaluable Models. 



Fig. 2 . Minimal evaluable chains and related evaluable models in the adder- 
multiplier example 



2.3 Possible Conflicts and Their Relation with Real Conflicts 

Summarizing, we can identify, off-line, those subsystems capable to become con- 
flicts. However, it is obvious that conflicts can not be detected without real 
observations. Therefore we call these sets of relations possible conflicts. 

Deflnition 6. A possible conflict is the set of relations found in each evaluable 
chain containing at least one evaluable model. 

Since each relation is provided by one component, it is straightforward to 
obtain the set of components involved in each possible conflict. 

To finish possible conflicts characterization, we compare them with conflicts 
obtained using dependency-recording engines. 

First, we understand by MODEL{C) a set of relations characterizing C 
behaviour, not only as an unique relation. 
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Let be: 

— P{S) the set of subsets of a given set S. 

— R = Rec s Hec- 

— MOD : COMPS ^ P{R),C ^ MOD{C) = {nGMODEL{C)} 

— COM : P{COMPS),ri — > COM{ri) = {comp\ri£MOD{com)} 

Proposition 2. If a GDE-like system finds a minimal conflict, co, from a dis- 
crepancy in V, there is a minimal evaluable chain H^c = {i4c,-Rec}; such that: 

V S Vec and co = COM{ri) 

Ti^Rec 

The method searching for evaluable chains is exhaustive. Hence, it finds any 
over-constrained system such that |i?ec| = m' -I- 1. These describe the minimal 
set of relations needed to find a conflict: while the knowledge about one variable 
in the over-constrained system is suspended, another value is estimated using 
the remaining well-constrained system. Since dependency-recording engines find 
the set of well-constrained systems able to do predictions [9], our method finds 
these systems too, as stated in proposition 2. 

Proposition 3. If a GDE-like system finds a minimal conflict, co, from a dis- 
crepancy in V, and Hec is the minimal evaluable chain verifying proposition 2, 
and all the evaluable models Hem obtained from Hec are equivalent, then any Hem 
will detect the discrepancy in v. 

Furthermore, an evaluable model represents one of the ways to solve the set 
of relations in a minimal evaluable chain. On one hand, if all the ways to solve 
the well-constrained system, using local propagation alone, are equivalent (as in 
static linear systems) only one evaluable model suffices to detect conflicts. On 
the other hand, when solutions obtained from different evaluable models may 
differ depending on the initial starting point, we might fail to detect a conflict. 
Therefore, our set of diagnoses may be suboptimal, w.r.t. the number of conflicts 
used to compute diagnosis candidates. 

Revisiting the example in figure 1, and assuming that the set of observable 
variables remain unchanged, any single or multiple fault will produce one or more 
of the following conflicts: {Mi, M2, Hi}, {M2, M3, H2}, or {Mi, Hi, H2, M3}. And 
they correspond to the set of components associated to each possible conflict. 

Finally, what would happen if cycles were present? We have found two cases. 
If we have an observable value within the cycle, we can do estimations (see 
figured. Hence we have an evaluable model. Otherwise, local propagation alone 
can not be used, and the evaluable model is not valid. As reported in [9] this 
problem can be solved with the super-component alternative. Nevertheless, this 
last problem is out of the scope of this paper. 
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3 Possible Conflicts in Dynamic Environments 

Let’s now consider those systems whose components have state. In order to model 
dynamics'^, system description include relations involving time derivatives. As 
others have done[2, 7], we distinguish two kinds of relations (a) instantaneous, 
and (b) differential. The former kind describes static behavior and is represented 
as a solid edge in the hypergraph. The latter applies to relations containing time 
derivatives and are represented as dashed edges. In the previous section, only 
instantaneous relations were used. 

The presence of differential relations in the models does not change signifi- 
cantly the main idea behind the possible conflict concept. However, we impose 
that (a) Vi and dvi/dt, or v[ for short, must be identified as different variables, 
and (b) only relations (yt, v[) will be allowed. This last relation means that we 
can estimate the value of the variable vt at time t, if we know or we can estimate 
the value of u' and Vi at time t — 1. 

This last condition classify our approach as an integration method\2], and 
forces a slightly different interpretation of the evaluable models. If differential 
relations are present, the evaluation process has two stages. Initially, these re- 
lations together with variable values at time t — 1 are used to estimate several 
variable values at time t. Afterwards, these values, together with current obser- 
vations, are used to estimate the rest of variables at time t. To proceed in this 
way, we have assumed that the initial state of the process is known (i.e. the 
values of state variables at the start of simulation). 



FTOl 

Inflow 



TR-1 



FT04 

Inflow 




Fig. 3. The system to be diagnosed. {Ti?l, T2, Ti?2} are tanks, {P2,P3} are 
pumps, and {V2} is a valve. We measure the flows FTOl, FT02, FT03, FT04, 
the level LT05, and the control signal to valve V2: LC05 



To illustrate these concepts we will use the system shown in figure 3, which 
contains elements common in many continuous industrial processes. Their mod- 
els were obtained from first principles laws, and typical control algorithms, such 

In this context dynamic does not mean time-varying. 
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as a PID controller: 




where fi applies for flow in line i, hx for the height in tank T, p is density, g is 
the gravity force, and Cvi is a parameter for pipe i. 

In this system, we have found four possible conflicts: 



Figure 4 shows the steps to And the first possible conflict. In the left hand 
scheme the minimal evaluable chain is represented. In the right hand scheme its 
related evaluable model is shown. The conflict includes relations from models of 
components: {TRl, T2, P2} and predicts the evolution of FT02. 

Since solid and dashed arcs have different temporal indices, they break ap- 
parent loops in figure 4, i.e. {frli, frla}. In fact, loops become spirals[7]. 
This implies that we can not And a diagnosis in the precise moment its symptoms 
manifest. Instead, it will be localized in the last monitored period (as described 
in the next section). 



Fig. 4. Two steps towards identifying possible conflicts. Represented variables 
are: flows (f), heights(h), and pressures (P). /i, /g, and /n are measured vari- 
ables 

4 Consistency-Based Diagnosis Using Possible Conflicts 

Consistency based diagnosis of dynamic systems is a complex task, because pre- 
diction and comparison of dynamic behaviour is required. Nevertheless, several 





A.) A minimal evaluable chain. 



B) Related evaluable model. 
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diagnosis systems have coped with these problems in different ways [4, 7]. In this 
section we explain how the possible conflict approach can be used in consistency 
based diagnosis. It can be summarized as follows: 

1. Identify every minimal evaluable chain. 

2. Select one evaluable model by minimal evaluable chain, and reject evaluable 

chains with no evaluable model. Each model produces a possible conflict: 

pci,...,pck- 

3. Build models as described by its evaluable model for each pc in SD: SDpc.. 

4. Iterate 4a to 4e: 

(a) feed each model SDpc. with system observations, OBSpc^. SDpc. pro- 
duces a set of estimations PREDpc., 

(b) check for discrepancy: || PREDpc. — OBSpc. ||> 5, 

(c) for each pci which finds a discrepancy, confirm pci as a real conflict, 

(d) introduce the set of components in each confirmed conflict SDpc. in the 
set of conflicts, 

(e) compute the set of candidates to diagnosis. 

In the field of continuous processes, 4b and 4e usually can not be implemented 
in a straightforward manner. The presence of dynamics and the lack of accuracy 
in the models makes infeasible a simple point to point comparison between pre- 
dictions and observations, hence we must compare their trends. Moreover, to 
discriminate among competing diagnosis candidates, we can not select new mea- 
surable points, because they are fixed in advance. Therefore, we propose to do 
consistency based diagnosis as a combination of monitoring plus fault detection. 
Periodically, we feed the possible conflict models with data series from the plant. 
Afterwards, each model estimates the trajectory of several system variables in 
the monitoring period. Both trajectories, measured and predicted, are compared 
by means of a Dynamic Time Wrapping algorithm, which give us a numeric esti- 
mation of the global distance between both series. This similarity measurement 
is compared against a fixed threshold. In this way, a fault is detected only when 
this value surpasses the threshold. 

5 Discussion 

Different approaches have analyzed system structure searching for a reduction 
in the computational effort of on-line model-based diagnosis. Nevertheless, this 
work is not intended for logical characterization of diagnoses [5] , and we do not 
use any kind of heuristic information to help discrimination among diagnosis 
candidates once conflicts were detected[ll]. Conceptually, we follow a similar 
pathway to that of structural residues generation [1]. However we do not analyze 
residual in a process control approach to model-based diagnosis [12]. Instead, we 
look for subsystems able to become conflicts in a consistency-based approach to 
diagnosis. 

Moreover, the analysis required to find out possible conflicts can be done 
off-line instead of going back and forward in a causal graph once a discrepancy 
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was found [10]. [6] reported a similar work for high observable, well structured 
discrete-events systems. Nevertheless, our focus mechanism does not split the 
system based only on observations and topology. We decompose it depending on 
relations among magnitudes too. Hence, possible conflicts can share components, 
as can be seen in figure 4. 

Although information from conflicts may be used in the candidate rejection 
phase [14, 3], this role of possible conflicts is out of the scope of this introductory 
paper and it is considered as further work. 

Main contributions of this work are a) the possible conflict approach is suit- 
able for on-line diagnosis, because it avoids the computational burden related 
with dependency-recording engines, and b) it overcomes the feedback loop prob- 
lem associated to dependency-recording engines in dynamic systems. 



Acknowledgments 

This work has been partially funded by the Spanish M.E.C. by means of CICYT 
grants TAP98-0828 and TAP99-0344. 

References 

[1] J. P. Cassar and M. Staroswiecki. A structural approach for the design of failure 
detection and identification systems. In Proc. of the IFAC-IFIP-IMACS Confer- 
ence on Control of Industrial Processes, Belfort, France, 1997. 119 

[2] M. J. Chantler, T. Daus, S. Vikatos, and G. M. Coghill. The use of quantitative 
dynamic models and dependency recording engines. In Proc. of the Seventh Inti. 
Workshop on Principles of Diagnosis, pages 59-68, Val-Morin, Quebec, 1996. 
NRC-CNRC, Canada. 117 

[3] L. Chittaro, G. Guida, C. Tasso, and E. Toppano. Functional and teleological 
knowledge in the multimodeling approach for reasoning about physical systems: 
A case study in diagnosis. IEEE Transactions on Systems, Man and Cybernetics, 
23, No. 6:1718-1751, 1993. 120 

[4] P. Dague, P. Devs, P. Luciani, and P. Taillibert. Analog systems diagnosis. In 
Proc. 9th Eur. Conf. on Artificial Intelligence, pages 173-178, Stockholm, Sweden, 
1990. Also appears in Readings in Model-based Diagnosis, pg. 229-234. Ill, 119 

[5] A. Darwiche. Model-based diagnosis using structured system descriptions. Tech- 
nical Report 97-07, Department of Mathematics American University of Beirut, 
1997. 119 

[6] A. Darwiche and G. Provan. Exploiting system structure in model-based diagnosis 
of discrete-event systems. In Proc. of the Seventh Inti. Workshop on Principles of 
Diagnosis, pages 93-105, Val-Morin, Quebec, 1996. NRC-CNRC, Canada. 120 

[7] O. Dressier. On-line diagnosis and monitoring of dynamic systems based on qual- 
itative models and dependency-recording diagnosis engines. In Proc. of the Eu- 
ropean Conference on Artificial Intelligence, ECAI96, pages 461-465. John Wiley 
& Sons, Ltd., 1996. Ill, 117, 118, 119 

[8] W. Hamscher, L. Console, and J. de Kleer(Eds.). Readings in Model based Diag- 
nosis. Morgan Kaufmann, 1992. Ill, 112 



An Alternative Approach to Dependency- Recording Engines 121 



[9] G. Katsillis and M. J. Chantler. Can dependency-based diagnosis cope with 
simultaneous equations? In Proc. of the Eigth Inti. Workshop on Principles of 
Diagnosis, pages 51-59, Le Mont Saint Michel, France, 1997. 116 

[10] P. J. Mosterman. Hybrid dynamic systems: a hybrid bond graph modeling paradigm 
and its applications in diagnosis. Phd in electrical engineering, Vanderbilt Uni- 
versity, Nashville, Tennessee, May 1997. 120 

[11] P. Nooteboom and G. B. Leemeijer. Focusing based on the structure of a model 
in model-based diagnosis. Int. J. Man-Machine Studies, 38:455-474, 1993. 112, 
119 

[12] R. Patton, P. Frank, and R. Clark. Fault Diagnosis in Dynamic Systems. Theory 
and Applications. Prentice Hall International, 1989. 119 

[13] B. Pulido and C. Alonso. Possible conflicts instead of conflicts to diagnose contin- 
uous dynamic systems. In Proceedings of the Tenth Inti. Workshop on Principles 
of Diagnosis, DX99, pages 234-241, Loch Awe, UK, 1999. Ill 

[14] L. Trave-Massuyes and R. Milne. Gas-turbine condition monitoring using quali- 
tative model-based diagnosis. IEEE Expert, pages 22-31, 1997. 120 



An Open Approach to Distribution, Awareness 
and Cooperative Work 



Walter Balzano^, Antonina Dattolo^, and Vincenzo Loia^ 

^ Dipartimento di Informatica ed Applicazioni, Universita di Salerno, 
via S. Allende, 84081 Baronissi (SA), Italy 
^ Dipartimento di Matematica ed Applicazioni, Universita di Napoli ’’Federico II” 
Via Cinthia 45, 80126 Napoli, Italy 
^ Dipartimento di Matematica ed Informatica, Universita di Salerno, 
via S. Allende, 84081 Baronissi (SA), Italy 



Abstract. In this paper we present CoHyDe (Collaborative Hyperme- 
dia distributed Design), an open and strongly distributed hypermedia 
model that supports distributed workgroups. The architecture is based 
on the metaphor of the actor model and is structured in three layers, each 
represented as populations of autonomous and independent actors that 
cooperate in order to achieve common goals. The model supports tempo- 
rally and geographically distributed workgroups and a current web-based 
implementation proves its applicability and functionality. 

1 Introduction 

An open collaborative framework should favour activities performed by geo- 
graphically and temporally distributed groups, supporting (a)synchronous work, 
notification of events and awareness tools. In order to achieve these goals, a dis- 
tributed groupware needs to address four important issues: 

— Distribution. People in a work group may be distributed geographically, thus 
the models must adequately manage distribution of data and tasks, not only 
at the human level, but also at the software level. 

— Communication refers to the basic ability to exchange information in any 
required form for the collaboration process between the involved parties. 

— Coordination focuses on the scheduling and ordering tasks performed by the 
parties involved. 

— Cooperation focuses on working on shared tasks in both asynchronous and 
synchronous ways. 

This work proposes an open, distributed collaborative framework, CoHyDe (Col- 
laborative Hypermedia distributed Design) modeled on the actor metaphor [1]. 
CoHyDe represents the extension of a previous adaptive hypermedia model, 
HyDe [4, 5, 6, 7], towards a Web-level platform designed to support the working 
activities of the partners of a large scale project^. The remainder of this paper is 

^ European Raphael project ’’Pompeii, Regio I: Conservation Project”, REF 
96/412143 (A/IT/6). 
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organised in this way: section 2 describes the CoHyDe model and discusses in a 
detailed way the role of the actor-based layers that constitute its internal archi- 
tecture. Section 3 is devoted to the Web-based prototype. A short comparison 
of our approach with related works is outlined in the conclusion together with 
some proposal of future works. 

2 The Architecture of CoHyDe 

The CoHyDe model is fully described in terms of autonomous and distributed ac- 
tors. Each actor [1] is a computational agent living with autonomous knowledge 
and performing duties in a distributed and cooperative environment. Figure 1 
shows the CoHyDe architecture, devoted to the collaboration activities; it is or- 
ganized in three layers {Coordination, Access and Work). Each layer performs 
activities of distribution, communication, coordination and cooperation during 
the interaction with the other layers. 




Fig. 1. Architecture of CoHyDe 



2.1 Coordination Layer 

The Coordination layer contains Collaboration actors (shortly C actors) . For each 
collaboration activity there exists a unique C actor, but it can encompass several 
sub-collaborations (identified as sessions), restricted to its subsets of tasks and 
co-workers. 

Each participant to a collaboration can create a cooperative session. All the 
sessions created within a collaboration are managed and coordinated from the 
same C actor and from it they inherit data and functionality. As shown in 
Figure 2, the C actor is a composite entity and can be viewed as the organization 
of internal data/scripts and of a collection of sessions that evolve in time. A 
Session actor inherits from the C class a subset of tasks, a subset of users, and 
specific constraints and abilities. 

A description of the C class is shown in Figure 3, highlighting the two separate 
sections containing respectively acquaintances and scripts. 
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Fig. 2. The sessions represent a meaningful part of each C actor 
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Fig. 3. Description of the Collaboration class C 



The acquaintances, on the left part of Figure 3, represent the internal data, 
while the scripts, on the right, enable the C actor to coordinate the other actors, 
performing local tasks and cooperating with them. 

In order to apply the WorkflowRules, C needs to know and to update the list 
of co-workers (UserList), their identifiers (UID), their profiles (Profile), their roles 
(Roles), the list of their tasks (which can evolve over time) (Tasks), the history 
of their personal interactions during the collaboration (History). 

C manages in a distributed and coordinated way: 

— the access control functionality, by direct cooperation with the Access layer; 

— the private and shared workspaces and the event notifications; 

— the history of all the cooperative processes; 

— its sessions. 

2.2 Access Layer 

This layer is composed from AC Access Control actors (shortly, AC actors). A 
unique AC exists for each user. AC is responsible for initializing collaboration 
activities, maintaining and updating roles and access rights of co-workers. In 
order to manage this dynamic knowledge, AC maintains an active communi- 
cation with the Coordination layer and a continuous cooperation with the user 
workspace. Figure 4 lists acquaintances and scripts related to this actor popula- 
tion. 
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Fig. 4. Description of the Access Control class AC 



AC actor knows the user, the address of his or her workspace and the ad- 
dresses of collaborations where the user is involved and the roles of the user in 
each of them. Also, AC is responsible for creating the workspace for a newly 
accepted co-worker; successively, when the co-worker demandes a joint collabo- 
ration (or session), AC verifies the access and then communicates its consensus 
(or not) to C. 

2.3 Work Layer 

The Work layer is composed from two populations of actors, Workspace and 
Awareness (W and Aw). 

Workspace level W manages the interface between the system and the 
user, by distinguishing private from shared activities. 

Virtual workspaces [11] improve the abstraction from the specific time con- 
straints and provide for simultaneous interaction of local and remote teams as 
well as rapid acquisition of feedback on material that must be reviewed by the 
whole group. Figure 5 lists acquaintances and scripts related to this actor pop- 
ulation. 
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Fig. 5. Description of the Workspace class W 
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W contains private and shared knowledge of each co-worker: references to 
the storage, user profile, specific features related to his or her collaborations. W 
updates access rights of its co-worker thanks the cooperation with the AC actor. 
Further, W maintains logs of tasks, events, settings and enables the co-worker 
to send files or message and to get info about other co-workers. 

Awareness level CoHyDe supports some consolidated types of aware- 
ness [14,15]: 

— Organizational awareness. It represents the knowledge of how the work group 
fits in with the larger purposes of a project or society. 

— Structural awareness. It is related to the roles, tasks and purposes of the 
people involved in the collaboration. 

— Workspace awareness. It maintains the collection of up-to-the minute knowl- 
edge a person holds about the state of another’s interaction with the 
workspace. In this typology of awareness is enclosed presence awareness and 
event awareness [2]. 

Furthermore, CoHyDe includes another typology of awareness, not taken into 
consideration by the current literature: 

— Domain awareness. It is constituted by the information and tools that are 
specific to the application domain, and helps the user to better understand 
the actions and choices of the other co-workers. 

CoHyDe provides a unique Aw actor for each user. In this way there is a one- 
to-one correspondence between Aw and W actors and a bi-directional commu- 
nication flow between them: 

~ W ^ Aw. Any time that a user performs an action on his or her workspace, 
the workspace W informs its Aw, in such a way Aw that it can acquaint 
(by means UpdateC, Figure 6) with the same event the coordinator C that 
in turn communicates (in multi-casting) the occurred event to the Aw^ that 
participate in the collaboration. 

— W ^ Aw. The aim of the previous cooperation is to inform all the users 
of the occurred event. For this reason, the Awg send a point-to-point event 
notification (by means UpdateW, Figure 6) to their Wg. We note that this 
notification action is performed on the workspaces of both active and absent 
users. 

Figure 6 lists the acquaintances and scripts related to this actor population. 

The main role of Aw is the communication of new events to W (addressed by 
AddrW) during the collaborations (recorded in CollaborationList) (and sessions, 
SessionList) to which the user belongs. 

The major part of its scripts are devoted to notification and log activity, in order 
to realize the first three types of awareness described earlier. These actions are 
carried out from the scripts NotifyActions, NotifyActivity, . . . , LogNotifications. 
The last script in Figure 6 (ApplyDomainAw) is a descriptive label that includes 
a more general set of domain-specific tools that make the collaboration process 
more effective. 
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Fig. 6. Description of the Awareness class Aw 



3 A Web Application of CoHyDe 

The Web-based CoHyDe prototype has been implemented on the top of the 
Web, using HTML, Java and Javascript. The interface of virtual workspaces is 
organized as multi-layer HTML frame-sets. The browsing of the vector files is 
enabled by the use of the WHIP, a public domain plug-in [17]. It supports multi- 
layer vector images, the automated localization of predefined views, interactive 
operations of pan and zoom on the images. 



3.1 The Application Domain 

The working context project, the European Raphael project, aims to favour 
the interactions between persons of different European states, with very specific 
skills, in order to preserve the cultural heritage of Pompeii. Many experts from 
different backgrounds (essentially, archaeologists, programmers, computer scien- 
tists and architects) are working on the restoration and preservation of some 
houses of Pompeii ruins. The experts need to work on a relief of walls (or on a 
map of houses), to superimpose often a vector relief (or map) with the corre- 
sponding wall (or aerial) photo, to fill specific forms for any interesting particular, 
the so-called contexts^ and to discuss their results in such a way as to generate 
scientific documentation on the houses (and parts of them). 

3.2 An Example of Cooperation 

Figure 7 shows a snapshot of the Web-based CoHyDe interface. 

The right part is automatically updated from the system on the basis of 
the user actions, preferences and choices, while the left part of the window is 
dedicated to the direct actions of the co-workers. 

The right part provides meaningful information about: 

1 . the collaboration and session names (in the example, the users are working 
on the definition of the contexts related to the walls E-N-S-W (East, North, 

^ Context can be a door, a window, a hole, a plaster trace. 
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Fig. 7. A snapshot of CoHyDe interface during the cooperation process 



South and West) of the room number 5 in the house 3, indicated in Figure 7 
with the label Contexts - h3r5wE-N-S-W; 

2 . the co-workers, their state (present, absent, temporally absent) and their 
features; 

3. the list of tasks related to the current session (Define context names . . . , . . . ); 

4. the log of meaningful events and activity, recorded in the Awareness frame. 

The left part of Figure 7 contains: 

5. on the bottom, a discussion area, on which co-workers can discuss in a syn- 
chronous way. 

6. a graphical area organized for browsing on vector (or raster) images. A layer 
of this frame is supported by WHIP plug-in [17] and enables co-workers to 
see vector images and perform a set of localization functions (pan, zoom, 
etc.) as highlight by the pop-menu shown in Figure 7. 

7. some speed-reference buttons that enable user to perform actions such as 
connect and disconnect, join, leave or create a session, show the list of co- 
workers with their profiles, identities, roles, tasks and home page addresses. 

In Figure 7 and in the next two figures, the workspace shown belongs to the 

co-worker Antes. 

In Figure 7, after a collaborative discussion, co-workers decide to concentrate 

their attention on context 2 (the second door on the relief); for this reason 

Helen zooms on it and extends her action to all the group. 
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Fig. 8. Swapping between vector and corresponding raster image 



The zoomed image is shown in left window of Figure 8. The difficulty to define 
the style of the door 2 presses co-workers to swap the relief with its related 
photo, clicking on the button Vector/Raster. The second window in the same 
Figure 8 shows the photo. 



4 Conclusion 

The CoHyDe approach proposes a distributed and cooperative model to support 
collaboration on the Web. It supports (a) synchronous cooperation activities, it 
is platform and browser independent, it provides very general solutions to the 
classical collaboration issues and manages group awareness abilities beyond the 
current page. 

The current literature proposes a number of systems that support collaborative 
applications: 

— CHIPS [9], DCWA [3], GroupKit [12] do not support session management, 
mechanisms for accessing shared information, tools for application-specific 
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message exchange, access control, but there have not specific support tool 
for the development and integration of these important concepts in the Web. 

~ Alliance [13] is a Web-level authoring environment: it accomplishes dis- 
tributed document management, communication and cooperation among 
distributed authors, but is provides only asynchronous collaboration sup- 
port. 

BSCW [2] offers basic support for cooperative work, providing a modular 
extension of the WWW’s client-server architecture, without requiring modi- 
fications to Web clients (required by CoHyDe), servers or protocol, but it pro- 
vides poor awareness tools, that actually are managed as asynchronous lists 
of occurred events. An attempt in this direction is proposed by MetaWeb [16], 
that extends the BSCW system with continuous feedback of the actions (ac- 
tivity awareness) and availability of co-workers (presence awareness). 
GroupWeb [8] and CoWeb [10] allow interaction over several pages in a 
group, but provide, differently by CoHyDe, no awareness of other users be- 
yond the current page. Also, GroupWeb is browser dependent (it is based 
on a specialized browser), and CoWeb relies on functionality only available 
in a now obsolete alpha release of Java and the Hot Java browser. 

Currently we are improving the synchronization mechanism and we are ded- 
icating our research activity to model a new module that provides CoHyDE of 
more specific authoring tools; a requirement in this direction has been stimu- 
lated by the partners during the established collaborations on the Pompeii ruins 
domain. 
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Abstract. Scheduling is an important aspect of automation in manu- 
facturing systems. It consists in allocating a finite set of resources or 
machines over time to perform a collection of tasks or jobs while satisfy- 
ing a set of constraints. One of the most known and hardest scheduling 
problems is the Job Shop, to which a distributed approach is proposed in 
this paper based on agent cooperation. There are essentially two types 
of agents: Job agents and Resource agents. Different agent behaviours 
based on heuristics are proposed and experimentally compared on ran- 
domly generated examples. 

Keywords: Scheduling, Job Shop, Multi- Agent systems. 



1 Introduction 

The Job Shop Scheduling Problem (JSSP) is one of the hardest [12] and most 
commonly encountered scheduling problems. Because JSSP is NP-hard, a wide 
range of approaches have been proposed for its solving. These approaches fall 
into two classes: the exact or complete methods which provide optimal solu- 
tions but explode with problem size, such as [1,2, 3, 4], and the approximate 
methods that provide ’’near-optimal” solutions but with a ’’reasonable” time, 
such as [6,9,10,14,15]. In spite of all this panoply of approaches, the mxn Job 
Shop scheduling problem remains difficult to solve. Hence, other issues have 
been considered like the distributed ones based on multi-agent systems, where 
the scheduling is carried out by a collection of agents. Among them we can 
state [5,11,10]. Scheduling consists in allocating a finite set of resources or ma- 
chines over time to perform a collection of tasks or jobs while satisfying a set 
of constraints. Each job is composed of one or several operations that can also 
be processed by one or several machines. The order of its operation process- 
ing defines its process routing, according which we distinguish essentially three 
types of factory scheduling problems: Flow Shop (same process routings for all 
jobs). Job Shop (different process routings) and Open Shop (unspecified process 
routings). The mxn Job Shop, in which we are interested in this paper, is de- 
fined as follows: -n jobs {Ji, ... , J„} have to be achieved on a set of m resources 
{Ml, . . . ,Mm}- -Each job Jfc, k=l, ..., n, is composed of Ufc operations performed 



S. A. Cerri and D. Dochev (Eds.): AIMSA 2000, LNAI 1904, pp. 132-141, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



How to Schedule a Job Shop Problem through Agent Cooperation 133 



according to a predefined order specified by its process routing. This order de- 
fines the precedence constraint between its operations. Let Ok,j and Okj+i be 
two given operations of a job J^: can start only when Ok,j has already 

been completed. -Preemption is not tolerated; i.e.; once started, an operation 
cannot be interrupted until it finishes. -Each job has its release date and its due 
date that specify its temporal constraints. -Each operation can be processed by 
one or several resources, and have a processing time depending on the resource 
chosen. -Each resource can process only one operation at a time. This condition 
is more known as disjunctive constraint. 



2 The Multi-agent Approach 

2.1 Multi-agent Architecture 

Job Shop scheduling involves two kinds of constraints: on the one hand, prece- 
dence and temporal constraints relative to jobs, and on the other hand, disjunc- 
tive constraints relative to resources. That’s why we define two classes of agents: 
Job agents and Resource agents. The former are responsible for the satisfaction 
of precedence and temporal constraints under their jurisdiction whereas the lat- 
ter are responsible for enforcing their disjunctive constraints. In addition. Job 
agents are responsible for allocating their operations to one of their resources 
and to fix for them a start time. Nevertheless, these two classes are insufficient, 
an interface between this collection of agents and the user is needed in order 
to: -create the collection of agents needed for solving the Job Shop problem, 
-recognize whether the problem has been solved by the agents and -inform the 
user of the result. Consequently, a third agent class composed of a single compo- 
nent is added, the Interface agent. The latter doesn’t intervene in the dialogue 
between Job agents and Resource agents. The proposed model relies on the Eco- 
problem solving one [7] enriched by [8], a Multi- Agent system where each agent 
has acquaintances (agents that it knows and with which it can communicate), a 
local memory composed of its static and dynamic knowledge and its own mailbox 
where it stores the received messages that it will later process one by one. More- 
over, each agent, independently of its type, has a behaviour based on satisfaction 
search with priority to message processing. 



Job Agents Each Job agent has as acquaintances the Resource agents that may 
perform its operations and the Interface agent. Its static knowledge consists of its 
release date, its due date, its process routing and for each one of its operations 
the list of possible resources with the corresponding processing times. Its dy- 
namic knowledge consists, for each operation, of the currently assigned resource 
with the associated start time, the temporal slack and the resource penalities. 
The temporal slack of an operation is the time interval that spans between the 
current finish time of its previous operation and the current start time of its 
next operation minus its greatest processing time comparing to the worst case. 
It indicates the temporal range within which the operation may be assigned to 



134 



Khaled Ghedira and Meriem Ennigrou 



without causing precedence constraint conflicts. Concerning the resource penal- 
ity, it indicates the number of times the resource has been solicited for that 
operation but has failed in finding a location for it. A Job agent is satisfied when 
all its operations are assigned and all its precedence and temporal constraints 
are satisfied and in this case it doesn’t anything. Otherwise, it tries to assign its 
operations not yet allocated. In the following, we will call, for a given job, its op- 
erations, the operations under its responsibility and its resources, the resources 
which are likely to achieve its operations. In the same way, we call, for a given 
operation, its job, the job it belongs to and its resources, the resources that may 
perform it. 



Resource Agents Each Resource agent has as acquaintances the Job agents 
whose operations are likely to be fulfilled by it and the Interface agent. Its static 
knowledge consists of the list of potential operations that it might perform with 
the correlated processing times. Its dynamic knowledge consists of the list of 
currently allocated operations along with their start times. A Resource agent is 
satisfied when its disjunctive constraint is satisfied and in this case it doesn’t 
anything. Otherwise, it solves all disjunctive constraint conflicts as described 
in §2.2. In the following, we will call, for a given resource, its operations, the 
operations that it may perform. 



Interface Agent The Interface agent has as acquaintances all Job agents and 
Resource agents. Its static knowledge consists of the list of jobs to realize and 
the list of available resources in the shop. Its dynamic knowledge consists of the 
schedule found and its makespan (i.e. the length of the time interval between the 
start time of the first operation achieved to the finish time of the last operation 
completed). Interface agent is satisfied when all the agents are satisfied, in this 
case it provides the found-solution to the user. Otherwise, it doesn’t anything. 

2.2 Global Dynamic 

Before starting the distributed solving process, the Interface agent asks the Job 
and the Resource agents to initialize their environments (tables 1 and 2 line 3), 
namely their local memory and their acquaintances. Furthermore, each Job agent 
Jfc determines an initial allocation for each one of its operations, that satisfies 
its precedence and temporal constraints and initializes penalities to zero for 
each one of its resources. An initial allocation, for a given operation, consists 
in choosing one of its resources and selecting a start time such that: ”//j=J 
then start_time (Ok,j) = release-date (J k) else start-time (Okj) = finish-time 
(Okj-i)”- Then, J^ sends these initial allocations to the selected resources to be 
checked (message ” Check ( Okj, start-time, processing -time” table 1 line 4). Such 
allocations do satisfy precedence and temporal constraints but not necessarily 
the disjunctive ones. A conflict between two operations assigned to a resource 
Ri occurs when these operations are overlapping. Such conflict is named an 
overlapping conflict. Hence, each unsatisfied Resource agent R^ proceeds to its 
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own satisfaction by solving its overlapping conflicts one by one: it selects Okj, 
one of two operations involved in a given overlapping conflict (table 1 line 7), 
and sends the message Select-TemporaLLocat (Ok,j)’' (table 1 line 8) to itself 
in order to insert Okj by choosing a new start time satisfying the three following 
conditions (function insertion_succeeded (Ok,j)” table 1 line 11 )\ -Ci: must 

be available during a period starting from the new start time and greater than or 
equal to the processing time of Okj in order to satisfy its disjunctive constraint. 
-C2: the new start time must belong to Okj ’s temporal slack in order to satisfy 
precedence constraint of J^. -C3: the finish time of Okj mustn’t exceed the due 
date of Jfc in order to satisfy temporal constraint of J^. In addition, it updates its 
overlapping conflicts by side-effect (function update-conflicts'^ table 1 line 9). 
” insertion-succeeded (Ri, Ok,j /’ is a function that returns ” false'' if it fails to find 
a start time for Okj satisfying Ci, C2 and C3, otherwise, it returns ” true" . In the 
"false" case, Otj is ejected and sent to its Job Jfc in order to find a new location 
(message " SelectSpatiaLLocat (Ok,j)" table 1 line 11). At the reception of this 
message (table 2 line 4), J^ firstly penalizes this resource (table 2 line 5) and then 
chooses a resource Ri (table 2 line 8), the less penalized among Okj’s resources 
(when two or more resources have the same less penality, J^, selects a resource 
according to one of the heuristics described in §4) . If all its possible resources have 
reached a predefined threshold, called "first-threshold", J^ sends the message 
" Create-Temporal-Locat (Ok,j)" (table 2 line 9) to Ri in order to build a free 
location satisfying both Jfc’s constraints and Ri’s constraints. Otherwise, it sends 
the message " Select-TemporaLLocat (Okj)" (table 2 line 10) to Ri asking it to 
find a location satisfying also both J^’s constraints and Ri’s constraints. If Ri 
fails in either placing the operation or creating a location, it sends the message 
"SelectSpatiaLLocat (Okj)" (table 1 lines 11 and 19) to Jfc and so on. When all 
resources reaches another predefined threshold, called "last-threshold" , Jfc sends 
an interruption message (table 2 line 6) to the Interface agent informing it that 
it has failed in allocating one of its operations. At the reception of this message, 
the Interface agent stops all the other agents and informs the user of the absence 
of solutions for the problem. 

To create a free location, R^ firstly saves the current context (function save- 
current-context" table 1 line 13), namely the start times of its operations, and 
secondly shifts to the right a subset Sop of them of a duration d (function 
" shift-operations" table 1 line 17), so as it will be available along a sufficient 
period greater than or equal to Okj's processing time. An operation O belongs 
to Sop if it verifies one of the following conditions (let Oprev be the operation 
performed by Ri before O): -O is involved in an overlapping conflict with Ofcj 
on Ri. -Start time of O minus finish time of Okj is less than d. -Oprev belongs to 
Sop and start time of O minus finish time of Oprev is less than d. The procedure 
"Shift-Operations (Sop,d)” shifts the operations of Sop of a duration equal to d. 
It returns a boolean value, which is set to "false" if there exists at least one 
operation in Sop that cannot be shifted, otherwise, it is set to "true" . Shifting 
an operation to the right consists in replacing it such that its new start time is 
equal to its old start time plus the duration d. This new start time must satisfy 
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the three conditions Ci, C2 and C3. Besides, shifting an operation to the right 
may lead to the shifting of its next operations (according to the process routing 
of its job) in order to not violate precedence constraints. If fails in finding a 
new location for it restores the old context (function ” restore-old_context” 
table 1 line 18 ). Before giving an example illustrating agent dynamic, we describe 
here the syntax used in tables 1 and 2 : -sendMsg (receiver, sender, ’’message”): 
’’message” is sent by ’’’sender” to ’’receiver”. -getMsg (MailBox):retiieves the 
first message stored in Mailbox. 



Table 1. Message processing relative to Resource agent Ri 

1. m <— getmsg(mailBox); 

2. case m of 

3. Initialize_Environment: initialize_environment(Ri) 

4. Check(Ofe,j, start_time, processing_time): 

5. Conflicts <— determine Jist_oLconflicts(Ri); 

6. For each conflict of Conflicts do 

7. Ofej <— select an operation of conflict; 

8. sendMsg(itself, itself, ”Select_TemporaLLocation(Ofej)”); 

9. update_conflicts 

10. Select_Temporal_Location(0*; j): 

11. if 7insertion_succeeded(Ri,Ofc,j) then 

11. sendMsg(Jfc, Ri, ”Select_Spatial_Location(Ofe,j)”); 

1 2 . Create_TemporaLLocat ion (Oi; , j ) : 

13. save_current_context; 

14. Sop ^ subset of operations assigned to Ri; 

15. Onext ^ the next operation performed by Ri after Okj', 

16. d <— start_time(Ofcj)+processing_time(Ofej,Ri)-start_time(OTiea:t); 

17. if 7(shift_operations(Sop,d) and insertion_succeeded(Ri, Okj)) then 

18. restore_old_context; 

19. sendMsg(Jfc, Ri, ”Select_Spatial_Location(Ofej)”); 



2.3 Illustrative Example 

Let the 4x3 Job Shop problem defined as follows: let Ji, J2, J3, J4 be four jobs 
with respectively (On, O12, O13, O14), (O21, O22, O23), (O31, O32), (O41, O42) 
their subsets of operations. Let Ri, R2, R3 be three resources. Suppose that the 
penalization threshold is equal to 5. Table 3 summarizes the operation process- 
ing times of a subset of operations according to the used resources. Figure 
la shows the Gantt-chart of the current state of the 4x3 Job Shop problem. 
In addition, the current penality of resource R2 for operation O22 is 5, the one 
of resource Ri for operation O31 is 2 and the one of resource R2 for operation 
O31 is 2 . The Resource agent R2 is unsatisfied because operations On and O31 
are overlapping. So, it chooses among them one operation to reallocate, suppose 
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Table 2. Message processing relative to Jobagent J* 

1. m <— getmsg(mailBox); 

2. case m of 

3. Initialize_Environment: initialize_environment(Jfe); 

4. Select_Spatial_Location(Ofc,j): 

5. increase_penality(sender(m)); 

6. if all penalities^less penalized resources then 

6. sendMsg(Interface, 3k, ’’Interruption”); 

7. else 

8. Ri ^ less penalized resource for Ok,j', 

9. if all penalities l first_threshold then 

9. sendMsg(Ri ,3k,” Create_TemporaLLocation(Ofcj )” ) ; 

10. else sendMsg(Ri,Jk,”Select_TemporaLLocation(Ok,j)”); 



Table 3. Operation starting times 





On 


O 13 


Ol4 


O 22 


O 31 


O 32 


O 41 


O 42 


Ri 


4 


3 


3 


- 


4 


5 


2 
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R 2 
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2 


- 


2 


3 


- 


4 


Rs 


- 


1 


- 


1 


- 


2 


2 


3 



that is O31. Because there is no possible place on resource R2 satisfying all the 
problem constraints, the Resource agent R2 ejects O31 and sends the message 
” Select-SpatialJocation (O31)” to its Job agent J3 in order to find another loca- 
tion. When Job agent J3 receives this message, it, firstly, increases the penality 
of R2 to 3 and, secondly, selects the less penalized resource for O31, in this case 
Ri to which it sends the message ” Select-Temporal Jocation(03\)” since its pe- 
nalization is still below the penalization threshold. The latter is available during 
the time interval [2 ; 6] so O31 will be placed on R2. Similarly, R3 is unsatisfied 
because operations O22 and O42 are overlapping. Let us suppose that it chooses 
O22 to reallocate. Because there is no possible place on resource R3 satisfying all 
the problem constraints, the Resource agent R3 ejects O22 and sends the mes- 
sage ” Select-SpatiaLlocation (O22T to its Job agent J2 in order to find another 
location. Since R3 is the only possible resource for O22 and the penality of R2 
is equal to 5, the Job agent J2 will send the message ” Create-TemporaLlocation 
(O22J" to R3 asking it to create a free location for O22 such that all the problem 
constraints are satisfied. For this reason, R3 shifts operations O42 and 0 13 of a 
duration equal to 1, in order to make R3 available for 022- O42 is shifted because 
it is involved in an overlapping conflict with O22 whereas O13 is shifted because 
O42 is shifted and the start time of O13 minus the finish time of O42 (5-5=0) is 
less than 1, the processing time of 022- However, O32 isn’t shifted because the 
start time of O32 minus the finish time of O13 (8-6=2) is greater than 1. Shifting 
O13 involves shifting O14 in order to satisfy precedence constraints relative to 
Job agent Ji. The obtained state after the above modifications is represented by 
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Fig. 1. Current state and the solution found 



figure lb where all agents are satisfied, so there is no more dialogue between the 
agents and the process has reached the end at this step. The interface agent will 
then provide this solution to the user. 

3 Heuristic Adding 

Two families of heuristics have been added to the basic model described earlier: 
the first one concerns Job agents and the second one concerns Resource agents. 
The first family consists in selecting the best resource to assign to a given oper- 
ation. The second one consists in selecting the best operation to replace among 
the operations overlapping. 



3.1 Job Agent Heuristics 

Let 0/cj- be an operation to replace by its Job agent J^. J^ will then select a 
resource according to one of the following heuristics: -Heuristic HI: J^ selects 
randomly a resource among the set of possible resources of O^j-. -Heuristic H2: 
Jfc selects the less loaded resource in the interval in which is likely to be 
assigned. -Heuristic H3: J^ selects the resource that performs the operation with 
the minimum processing time. Let Jfe be a job and be the operation that 
it tries to place on within the interval [ti ; 12 ], where ti corresponds to the 
earliest start time of and t 2 corresponds to the latest finish time of Qk,j- 
The load of Ri relatively to [ti ; t 2 ] is obtained by summing the processing times 
of the operations already assigned to Ri in [ti ; t 2 ] and dividing by the number 
of these operations. 

3.2 Resource Ageut Heuristics 

Let Ofcj and Ox,y be two operations involved in an overlapping conflict. The 
Resource agent Ri will then select an operation according to one of the follow- 
ing heuristics: -Heuristic H4: R^ chooses randomly an operation between O^j-, 
^x,y -Heuristic H5: Ri chooses the operation with the minimum processing time 
between Q>k,j-, ^x,y 
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4 Experimentation 

The goal of our experiments is to compare six versions, resulting from the dif- 
ferent heuristics above-defined. In order to find the best version that provides 
the best performances in terms of Makespan and run time. The six versions are 
presented in table 4. The experiments are performed on randomly generated ex- 



Table 4. The different versions 



Name Version 


Resource selection 


Operation choice 


RANDOM 


HI 


H4 


RANDMIN 


HI 


H5 


LESSLOADRAND 


H2 


H4 


LESSLOADMIN 


H2 


H5 


LESSTIMERAND 


H3 


H4 


LESSTIMEMIN 


H3 


H5 



amples. The generation is guided by the following four parameters: -Complexity 
degree P corresponds to the probability that two operations will be involved in 
an overlapping conflict. P G{0.1, 0.3, 0.5, 0.7, 0.9}. -Number of jobs Nj g{ 5, 10, 
15}. -Number of operations per job No G {5, 10, 15, 20}. -Number of resources 
Nr G {10 , 20}. Consequently, 120 configurations (P, Nj, No, Nr) are obtained. 
Due to the non deterministic character of our model, we have generated, for 
each configuration, 10 examples and we have taken the average. Therefore, the 
total number of examples that have been generated is 1200. The performance of 
the six versions is assessed by the the two following measures: -Makespan: the 
length of the interval between the start time of the first operation achieved and 
the finish time of the last operation completed. -Run time (CPU time) requested 
for solving the problem instance. Two families of experimental comparisons have 
been selected among several ones to show the different versions’ performances 
in terms of makespan and run time. The first one has the total number of op- 
erations (Nj X No denoted Tuop) that varies but keeps complexity degree (P) 
constant equal to 0.5 and the number of resources (Nr) constant equal to 10 
(figure 2). The second one has complexity degree (P) that varies but keeps the 
total number of operations (Tnop) constant equal to 100 (figure 2). 

5 Conclusion and Perspectives 

In this paper we have proposed a Multi- Agent model for solving the mxn Job 
Shop scheduling problem. Two classes of agents have been defined: Job agents re- 
sponsible for satisfying their respective precedence and temporal constraints and 
Resource agents responsible for satisfying their respective disjunctive constraints. 
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Fig. 2. Makespan and run time with P=0.5 and Nr=10 



Each Job tries to allocate its associated operations in cooperation with its Re- 
source acquaintances such that all their constraints are satisfied. The choice of 
Resource (resp. operation) to allocate (resp. to instantiate) is very important for 
model performances, namely makespan and run time. Thus, six versions based 
on heuristics have been proposed and compared on randomly generated exam- 
ples: the " LessTimeMiri" version (minimum processing time for both operation 
and resource) provides the best makespan whereas the " LessLoadMiri" version 
(less loaded resource and operation with the minimum processing time) requires 
the less run time. Moreover, the makespan and the run time corresponding to 
these versions vary with complexity degree in a linear way. Furthermore, experi- 
mentation has shown a linear tendency for makespan varying with the number of 
operations and a late appearance of the exponential aspect (from 150 operations 
for 10 resources). 

As far as our future work is concerned, other heuristic combinations and 
other experimentation based on compromise makespan/run time are foreseen. In 
addition, we shall extend our model to the optimisation aspect and to compare 
it with similar models. 
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Abstract. This paper describes a method of multi-agent analysis and 
design for reactive, real-time information systems, relating to complex 
and risk-bearing applications. The fundamental principle consists of 
using a series of models in “cascade” to shift from an abstract represen- 
tation of the problems to a formal one of the directly programmable 
agent (in Java for example). The first basic idea is not to have fixed 
goals or tasks, but rather for them to be gradually released from the 
analysis of the interactions between the actors (human or artifacts). The 
second idea, aims at integrating the space-time constraints according to 
an individual and collective point of view in a concurrent way. The last 
one, proposes neither to process on a hierarchical basis nor to laminate 
the final architecture of the interactions between agents but to, on the 
contrary define the acquaintance rules and their evolution according to 
the context. This paper details the various stages of this approach and 
compares them with other current work. 



1 Introduction 

The analysis of complex risk-bearing systems is directed towards information systems 
centered on the processes, whose architecture is built on network or distributed ac- 
cording to the various actors viewpoint. Thus in Wooldridge's work [1] an agent 
oriented analysis and design methodology for distributed and evolutionary information 
systems is presented. This method is based on predefined agents roles. Then an analy- 
sis of interactions between those various roles is implemented to account for the col- 
lective aspect of the system. 

Kinny's Work [2] also suggests an agent oriented methodology which uses model- 
ing techniques of individual agents based on the beliefs -desire - intention paradigm. 
Thus, it seems interesting for us to define an hybrid approach (individual and collec- 
tive) one to design agents and systems based on agents like those mentioned in [3], [4] 
and [5]. 

Following Jennings's work [4], it proved interesting to try and improve the interac- 
tive complex systems making them more easier to use, quicker, more robust and easy 
to conceive and implement. 
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The models and the context specification are described here below and illustrated 
with an example. 



2 Why an Agent-Oriented Method? 

For several years, a great number of object analysis and design method have been 
developed. We will quote to Cauvet [6] and Rumbaugh [7] works for a quick layout. 
They have in common, the limits of the actors behavior representation, the re-use of 
the knowledge obtained by these actors (knowledge re-engineering) and a poor real 
time specification of space-time constraints. 

In an intuitive way we think that an agent-oriented method is initially a reasoning 
step with phases, stages, etc., a dynamic phase which can mitigate some of those in- 
conveniences. To represent that dynamic phase, we use some techniques and models. 
This representation favors the emerge of dynamic associations (reasoning parts or 
"chunks "). These are evolutionary concepts and thus revisable to be encased in for- 
malization steps, allowing checking and coherence. Then we will implement them in 
an agent form with evolutionary programming tools such as : a real-time object lan- 
guages, logical objects (CLIPS) or dynamic entities (SCHEME) or an agent frame- 
work such as Madkit [8]. 

Now, we introduce some approaches such as Use Case [9], Dano work's and an in- 
tuitive agent-oriented one. The Use Case approach consists in looking at the system to 
be built from the outside, from the user's point of view and from the expected features. 
The Use Case is addressed to an actor who is going to request the system and expect a 
measurable service from it. This request creates the notion of external event which 
calls for the system. The use Case regroup a sequence of actions to be created by the 
system. 

The approach proposed in Dano's work [10] is a general method of acquisition and 
conceptualization of the necessities which focuses on the static aspects and then on 
dynamic one. That approach is based on Use Case and on Statecharts [11]. The inno- 
vation concerns the integration of the formalization within the development process of 
the object software. 

An agent-oriented approach, according to our point of view begins with the actors 
interactions analysis in a informal way throughout a defined process pending to the 
indexation of the process into a triggering device. Then, we use a human-computer 
interaction modeling methods or techniques such as the GOMS method "Goal Opera- 
tor Method and Selection rules" [12] to model activities and behavior. A formal speci- 
fication is elaborated thanks to formal methods for example the ETAG grammar one 
"Extended Task Action Grammar" [13]. This level allows the specification of tasks 
matching activities and the description of behavior of individual and collective or- 
ganizations for each of them. Then, conception is made up with classification and 
grouping of specifications. This conception brings to the foreground the individual 
and collective dimension which requires a mode of communication such as the 
STROBE "STRean OBject Environment" model for example [14]. Finally an imple- 
mentation in an object language is created. 
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Unlike the approaeh proposed in [10] ehieh defines the Use Case and seenarios, we 
think that during the analysis step seenarios are going to develop progressively and 
that the agents eannot be identified by direet analogy with the aetors in the studied 
proeess. So, in order to determine the agents, we need to analyze the aetivities, not 
through the aetors of real world but through internal and external events whieh give 
rhythm to proeess. 

We may note that these events are not only dependent on ehanges of state (states 
are not yet speeified beeause we still have no tasks). They are relative to ehanges 
observed in the proeess. These ehanges observed in the proeess have for origin the 
speeeh-aet [15]. They are in faet here generalized in any kind of exhange among the 
aetors involved in the proeess, the proeess it self and the spatio temporal eontext. The 
ehart below summarizes briefly the approaeh expressed. 
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Figure 1 Three approach summarize 

We think that an agent-oriented method won't be able to base itself on Use Case 
from a predefined seenarios. It will be based on a dynamie observation and events 
from speeeh-aet among the aetors. So the method deseribed in the next paragraph 
introduees the various steps to eross speeeh-aet to the agents who implement them. 



3 A Preliminary Study 

The agent paradigm is a popular among researehers although it is progressively intro- 
dueed in eertain professional applieation (as the web). The role of the agent-oriented 
methodology is to assist and to manage agent applieation during the whole life eyele 
step. For that, we distinguish three approaeh as : 
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Extension of object-oriented methods and techniques - Burmeister [16], 
Jenning [1] and Kinny [2]. This approach take its advantage from the similarity be- 
tween agents and objects and the experience of the objects technologies. The major 
disadvantage is not only the lack of the social dimension of the agent but also the 
cognitive one (mental state). 

Extension of the knowledge engineering method - MAS-CommonKADS [17] and 
CoMoMAS [18] which re-uses the defined ontology library and tools released from 
knowledge engineering methods. The disadvantage of this extension that it's a cen- 
tralized design and they don't tackle the social and distributed aspect of agent. 

The third approach gathers all other works, as one uses formal approach (DESIRE 
framework) [19], or the hybrid one which is based on dialogue analysis and interac- 
tion applied to the medical domain [20] or the approach to conceive cooperative in- 
formation agent. 



4 General Framework 

When the behaviour of the human is being taken into account it lead us to consider 
that the dialogue between actors is a dynamic design source of the process itself The 
activity is then defined as a generic unit of representation, with a distributed control 
and a purpose to be a vector of interaction between the constraints of the system 
(event-driven). The adaptation of the system to the user and its reciprocity is made 
possible with the revision in real time of the scheduling and the activities. 

Rather than analysing individual and collective dynamics system separately, a con- 
vergent interactive analysis as in Barber work's [21] allows us to better articulate 
them, by including the distributed aspect thanks to the multi-agents concept. The ob- 
jective is then to provide a methodological approach of agents specification and de- 
sign which can perceive the environment, interpret it and act on it. 

The starting point is based on the analysis of the dialogues for goals identification 
released from activities. The tasks identification and their regrouping allows a dy- 
namic design and a structural and functional composition of the system components. 
During this stage, the architecture and the component are revised by rules (conflict 
resolution) not as a Guided Use Case [22] approach but with a Case Base Reason- 
ing [23] type of resolution. 

4.1 Process 

The first step of our approach consists of dialogue analysis. So we construct manually 
the basic set of goals required by the system. To do that we extract from the dialogue 
exchanged between the actors of the system the set of the necessary triggers. For each 
one we associate it with the adequate goals. Then we set up the integrated class of 
activity launched. An example is given (see case study below). 

The second step is the dialogue and action reification. It's the analysis step which is 
made up of two sub-steps. The first sub-step allows us to elaborate scripts for each 
goal. To do that, we analyze with the GOMS method each class of integrated activity. 
By this way, we begin the build up of the agent body. The second sub step is carried 
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out in a parallel way and allows us to set up the basic task bases for the system. This 
sub step uses the ETAG grammar. To do that, we specify for each method, a function 
or a step the set of required objects, states, events, tasks and interaction diagrams. By 
this way, we begin the construction of the collective model. An example is given in 
the case study section to illustrate this step. 

The third one is the design step. It consists of the construction of different entities 
needed by the system throughout aggregation and classification. Then, we use the 
STROBE model as a proof communication tool between the different entity taken two 
by two to determinate the agent. Then, we assigned for each agent the resource ex- 
tracted from the different ontology of the system. These resources are differently in- 
stantiated according to the context. Finally all the agents are implemented with JAVA. 
An example of some agent is given in the case study section. During the second and 
the third steps some conflicts are detected so a knowledge and task revision is per- 
formed for each step to avoid error. The chart below summarizes those steps. 




Figure 2 Step summary 



4.2 Concepts 

The concept used by our approach are process, trigger, activity, goal, script, task, 

individual model and collective model. Here are some definitions of the basic con- 
cepts : 

- Trigger : not only an event which launches a group of activities to accomplish a 
goal, but also any change taking into effect within the context. 

- Activity : it's a generic unit of representation. It can be considered as an interac- 
tion vector between all the constraints of the system. 

- Goal : a gathering of facts which describes the objective to reach. 

- Agent : autonomous (self-sufficient) software by the meaning of the STROBE 
model [24]. 

4.3 Models 



The basic models that emerge from our approach are : 

Activity model : this model allows us to identify the whole integrated activity of the 
system. An activity is to be done by the actor in a voluntary or non way to accomplish 
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goal. An activity is characterized by used method and it will be specified by a semi 
formal method "GOMS". 

Task model : this model allows us to describe the task composition assigned to one or 
more actors under a time constraint imposed by the speech act between actors. This 
task basis will be formally represented by ETAG grammar. 

Agent model (individual ) : it represents the basic characteristics of the agent. Moreo- 
ver, it includes goals, services, reasoning mechanism, communication modules, etc. 
The goal of that model is to provide a description for all the agents used by the sys- 
tem. An agent will be defined by all the tasks and services associated for each one. 
Organizational model (collective) : this model represents agents organization. It de- 
scribes the architecture which is made up of agents and the relations among them, plus 
their environment. This model itself is composed of a coordination model and a com- 
munication model. 



5 Case Study 

To illustrate our approaeh, we will study the emergency health care. Our system is 
composed of those actors : the doctor, the nurse, the reception and orientation nurse, a 
personal computer and the patient. All these actors cooperate to accomplish the main 
goal : a good assumption and awareness of the patient care. In this case study we de- 
tail steps of our method. 

Step 1 : speech act analysis 

Here we will construct manually the bulk of the triggering device as well as released 
goals then we set up the associated activities. For that we consider D = {Dl, D4} : 

D1 : "patient arrival alone or not " 

Some released goal to accomplish are : 

For Dl (Al, A2), the goal is "admission of a patient" 

For these triggering device here are some associated activities : 

For Dl : Al : "reception of a patient", A2 : "orientation of a patient" 

Step 2 : analysis 

step 2-1: GOMS specification 

This step is performed to specify the task bodies. GOMS is the tool to construct the 
individual agent model and its body. To do that, we consider : 

Rules: R2 : rule to accomplish "admission of a patient" 

If (new patient) then (accomplish "admission new patient" ) 

Return with goal accomplished 
According to this rule we found some script : 

SI : method for goal "admission new patient" 
stepl : social allowances (Ml) 
step 2 : visual allowances (M2) 
step 3 : method for goal : "create file" (M3) 
step 4 : method for goal : "move to care unit" (M4) 
step 5 : validation (M5) 
step 6 : return goal accomplished (M6) 
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Step 2-2 : ETAG formlalisation 

It's a formal approach to describe task hierarchy by event, independently from the 
agent body. It's a tool to detect the event that generates acquaintance for the collective 
model. 

Type [object = identification] 

Value set : traumatology | medical 

End object 

Type [object = box] 

Value set : occupied | non occupied 
End object 
Entry 1 : 

[task > identification patient] , {event > patient], {object > patient = *P] 
tl [event > new patient], [OBJECT > patient = p] 

"identification new patient" 

Entry 3 : 

{task > box attribution] , {event > patient] , {object > patient = *P], {object > Box = 
*B] 

t4 [event > patient], {object > patient = p], {object > Box = "non occupied "] 
"attribute a box to a patient" 

Step 3 : Design 

In this step we classify and regroup the system components depending on internal and 
external events. For the components we apply the STROBE model to qualify it as 
agent or not. The chart below illustrate an example of STROBE communication be- 
tween "Box" and "SaisieOrientation" components. 

Glossary 

suppose the component "SaisieOrientation" which initiate the dialogue 
suppose the component "box" the partener 

suppose " is this box ready" thz initialisation message by convention 

to be sent by the "SaisieOrientation" component to the 'Txtx" component at the tO time. 

this message is built dynamically during the message process exchange. 

u 

In STREAM mode, we obtain ; 

'hox ready, box not ready" are the output sequence of the "box" component, 
each one corresponding to an input. 

"looking for box ready" task to be performed by the component "box" 

"box allocation" is task performed by "SaisieOrientation’ component applied to the 
output from "box" component 

0 

The STROBE response, 

"box ready, box not ready" i "looking for box ready" ("is the box ready") 

"is the box ready" 1 "box allocation" ("box ready", "box not ready") 



Figure 3 Communication with STROBE model 

Step 4 : Implementation 

Here we gives the specification of the retained agent using the CIAgent framework 
[25]. The chart 5 describes the prototype implemented in JAVA. 
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Figure 4 General framework of our prototype 



6 Conclusion and Perspectives 

We have presented an agent-oriented analysis and design method for the implementa- 
tion of eomplex systems based on agents. That method artieulates around four essen- 
tial phases: the analysis of the dialogue itself among aetors, the semi-formal and for- 
mal speeifieation of the dynamies, the revisable and eoneourant design of eomponents 
and the implanting phase. The main purpose of this paper is the definition of an agent 
based stage methodology whieh is based on the upstream analysis of the dialogue 
itself among the system aetors. 

For the "RADAI" method we assert as a signifieant result, the eombination of the 
eonvergent aseending interaetive analysis based on multiple points of view with the 
speeifieation and the downward implementation of agents. This generates a flexibility 
based on revisability and autonomy, whereas Jennings’s work [1] presents a striet and 
rigid methodology based on predefined agents throughout their speeifie roles. How- 
ever, we lose many of the advantages aequired by this flexibility by the CIAgent 
framework that we used and whieh is based on the adaptation of agents by rules with 
poor meehanisms of inferenee. Thus, we may eonsider some improvements sueh as the 
use of a training meehanism to implement agents, like the results in Baron’s work [26] 
on knowledge revision by genetie algorithms, or those with mutant agents [27]. 

On the other hand, we noted that the used methods (GOMS and ETAG) to speeify 
tasks and aetivities are way too simple, too heavy and limited. Then we plan to adopt a 
more powerful and rieher formalization method, sueh as the DESIRE framework of 
Brazier’s work [19]. Moreover, agent must have a superior agility therefore the aetual 
framework. For that, evolutionary or/and antieipative model developed by Ekdalh and 
Astor [28]or the FOCALE projeet [29] explore different possible solutions. The pro- 
totype whieh we developed is a real simulator, whieh allows to test, to verily, to up- 
date and to eomplete fortheoming knowledge in the proeess of emergeney health eare. 
The agents foreseen in the system eover learning, traeking, eooperation, 
help(assistant) and aeereditation eontrol aspeets. 
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Abstract: This paper presents the earlier results of the CMOS projeet 
prototype ineluding an embedded multi-agents ITS (Intelligent Tutoring 
Systems) aimed to help effieiently the learner who faees 
troubleshooting maintenanee tasks. This environment gives responses 
dedieated to aeronautieal training sessions aeeording to a three-step 
prineiple: first to «introduee», seeond to «eonvinee» and, finally, to get 
to do. We emphasize two main eharaeteristies: a real-time full 
simulation of the teehnieal domain, whieh works with a tutoring multi- 
agent arehiteeture, ASITS. ASITS is supplied with reaetive and 
eognitive agents to traek the learner’s performanee, to deteet inherent 
negative effeets (the learner’s "eognitive gaps"), and as a feedbaek 
issue, to identify some defieieneies that eurrent training simulator laeks. 
Therefore, as the measuring of gap values with quantitative rules keeps 
sometimes hazardous, the eoneept of simulation has to be extended to a 
Qualitative Simulation approaeh. 

Keywords: Interaetive Learning Environments, Real-time Simulation, 
Intelligent Tutoring Systems, Multi-agent Systems, Graphieal Interfaee, 
Diagnostie Reasoning. 



1 Introduction 

This paper deseribes why the « intelligent » desktop simulators for individual learning 
and/or team training rely on soeial aspeets of distributed artifieial intelligenee. This 
refieetion leads us to study eomputationally intelligent behavior using speeifieally 
tailored arehiteeture for multi-agents ITS (ASITS, Aetor System for ITS) [7, 8]. 

This arehiteeture has been applied in a simulation-based learning environment in 
order to allow the instruetors to perform an anytime assessment [2] by traeking the 
learner in real time (Progressive Assessment). As a feedbaek issue, we show how 
speeialized eognitive agents ean eontribute to model the interaetion design of a 
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learning session in an Intelligent Desktop Trainer. In multi-agent based ITS, this 
perspective raise three major problems: (i) the definition of the communication 
standards for exchanging real-time information between the agents, (ii) the indexing 
of expertise to facilitate 'know-how' tracking within all the relevant domains and (iii) 
the cognitive interaction design. 

To resolve the problems raised before, agents must integrate adaptable strategies to 
monitor the user's actions and his level of attention and to provide him the adequate 
help in the same way as an ITS does. 

In order to detect inherent negative effects (i.e. the learner’ s"cognitive gaps") 
brought into the learning process both by imperfect or incomplete immersion in 
simulation and by insufficient learner expertise for pedagogical strategy in ITS [7], 
we have developed a prototypel as a fully runnable maintenance simulator [13]. In 
this system, the "full-simulation" principle is supported by an embedded architecture 
built on three simulation layers: Real-time Kernel (ffee-play mode), distributed 
simulation (individual or team learning of procedure) and qualitative simulation 
(evaluating cognitive gaps from the learner). 

This architecture matches exactly the procedural learning (aeronautical maintenance) 
and lets to control all the three phases of learning - (i) the learner is firstly introduced 
in the exercise in instructor-assisted mode, (ii) the he/she can try to repeat training 
sequences in step-by-step mode in order to understand the knowledge acquired and to 
get convinced (the agents follow the learner and correct him/her immediately), (iii) 
finally, the learner does all exercises in ffee-play mode (agents don’t show their 
reaction during the exercise, but they sum up all learner’s activities and deliver a 
summary sheet for further debriefing with the instructor). 

Finally, the system was designed with the capacity to detect immediately (it runs in 
real-time environment) changes in the cognitive profile of the learner. This aspect can 
be estimated (positively or not) depending on three learning mode identified as a, b, c 
mechanisms (§4.3) and using a typology of primitive cognitive gaps (tunnel effect, 
dropping or context gaps). The paper focuses on how aspects of user's behavior can 
be monitored by a gap evaluator agent. 



2 New Concepts for Simulation Training in Aeronautics 

2.1 Key Concepts for the Design of Task-Oriented Activities 

Degani & Wiener [5] and Boy [4] have focused on the manner in which pilots use the 
checklists in normal situations. As new research issues from this previous work we 
emphasize the search for tools more adapted to human operators, aimed to improve 
the reliability of the human-system tandem. We have shown that the learner cannot be 
considered as a simple "executant". He/she has to be trained to take real time 
decisions from concurrent activities: understanding of checklists and correct 
execution of prescribed operations. Also, when users do not apply procedures as 



■ CMOS (Cockpit Maintenance Operation Simulator) [13] - supported by Airbus 
Training - is a an advanced prototype for aeronautical maintenance operators (A340). 
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expected, it is important to ask it should be considered as an error (i) if the non 
application is an attempt to adapt procedures because the operational situation is 
rarely identical to the expected situation or (ii) if the non application is a result of 
improper procedure design. 

They have to be mastered in reverse mode as a dual control of each other by a 
specific ITS strategy, the “reverse assessment": free learner’s interaction with the real 
time kernel of the simulator in fact, is tightly coupled with an evolving checklist 
window which traces and monitors the learner in ITS mode -and vice versa- (fig. 2). 



2.2 Different Approaches for Training Simnlation 

A fair model of classical ITS simulator is Sherlock 11, a leaming-by-doing system for 
electric fault diagnosis for F15 fighter aircraft [10]. With Sherlock the progressive 
assessment of learner's competency became a prominent goal for future in ITS. In the 
same way, the concept of "full-simulation" has integrated recent advances on desktop 
trainers [13] - by merging three paradigms: 

- Full Fidelity Simulation targets at a quality insurance given by a very fine grain for 
represented knowledge issued from simulation modules, each one is devoted to a 
specific aircraft system (Hydraulics, Engines, Electricity...) and they act together as 
reactive agents in the Real Time Kernel of the simulator, 

- Networked Simulators dynamically split on different stations zoomed processes of 
the previous modules allowing the learner to focus on precise point with an 
important cognitive unballasting , 

- Qualitative Simulation analyze monitors the interaction between the learner and 
the simulator in terms of positive or negative effects when assessing the learner’s 
performance [6, 11]. 

When the previous functions act together on a desktop trainer, the issued realistic 
feed-back can be qualified of “Full-Simulation” in spite of the lack of an effective 
immersive interface. 



3 A Prototype for “Full-Simulation” 

3.1 Human Factors in Aeronautical Training 

Trainers’ developers usually seek to minimize human information processing and 
cognitive demands on the user when they learn on simulators, and more often in 
safety-critical sequences or procedures. However, the way of achieving this goal 
differs greatly with the aim to avoid different classes of cognitive difficulties (task 
load, cognitive gaps...). For this primary work, we have used a reduced framework 
for classifying different cognitive task loads: 

1. Concurrent mastering of three layers of simulation: kernel, checklists, ITS (see 3.2) 

2. Splitting learner activity into two alternate but equivalent interactions between 
hypertext document and window simulation; and 
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3. Possibility of joint interaction from multiple learners acting on networked station 
as in the simulator. 

The methods we used to perform task analysis in # 1 are HCI related ones adapted 
to man-machine interaction [4], in which reactive agents for detecting cognitive task 
loads and learner’s misses will be paramount. Detecting gaps during the task analysis 
for # 2 needs planned agents for recording the distributed design process and 
replaying some portions of it if necessary. For # 3 we have developed tools and 
techniques to assess how users will work and perform with cognitive agents in 
distributed engineering environments [8]. 

This will require the creation of novel methods and interfaces for real-time 
tracking of learner’s activity with adaptive agents operating at any time. 



3.2 Proposed Solution: Layered Architecture for the “Full Simulation” 



The concept of the full simulation, as defined above, makes us to propose the 
three-layered architecture for the training environment. 




Figure 1. “Full simulation”: different layers 

The approach of the “full simulation” in three layers imposes an architecture of the 
ITS, which is itself distributed between these 3 layers: 

- in the center, the kernel of simulation represents the physical object of simulation 
(an aircraft); this layer is exposed to the constraints of real-time 
actions/ stimulations 

- the layer of the simulation follow-up is added above (so as to save a trace, a 
chronology of the activated commands). This level corresponds to the description 
of the “profession”, ie to the constraints related to the norms of the profession 
taught (checklists, procedures etc.), 

- the third layer completes the system by adding a cognitive validation to the 
pedagogical assessment of the diagnostics made on the previous layer. 

These three levels form together an environment of the “full-simulation”, and 
involve different agents such as reactive agents in pseudo real-time of the kernel, 
planned agents of error corrections in procedures, and cognitive agents which need 
heuristics in order to proceed to the gaps evaluation. 
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3.3 Qualitative Monitoring of the Learner’s Operations 

The learner can operate in free-play mode (as in a Full Flight Simulator) but the real- 
time kernel of the simulator cannot assume a quantitative comparison of the value of 
expert solution with the value of student solution. It can just signalize if «cockpit» 
equipments and indicators run properly or not. This is why the qualitative simulation 
is necessary to monitor the progressive assessment of the learner by detecting gaps. 



4 Multi-agent ITS (ASITS) 

In multi-agent based ITS, this perspective raises three major problems: (i) the 
definition of the communication standards for exchanging real-time information 
between the agents, (ii) the indexing of expertise to facilitate 'know-how' tracking 
within all the relevant domains and (iii) the cognitive interaction design. 

In order to resolve the problems rose before, agents must integrate adaptable 
strategies to monitor the user's actions when mastering two concurrent activities: 
understanding of checklists and correct execution of prescribed operations. These two 
activities have to be managed in reverse mode as a dual control of each other by a 
specific ITS strategy, the “reverse assessment”: a- free learner’s interaction with the 
real time kernel (pseudo free play) is tightly coupled with b- an evolving checklist 
window (aeronautical procedures) which monitors the learner in ITS mode. This 
strategy must be applied to study maintenance procedures (AMM tasks) together with 
the practice of tracking and troubleshooting procedures (TSM tasks)^. So, defining the 
curriculum consists mainly in choosing a precise set of "key-tasks" in order to give 
the learner a general knowledge of the structure of the complete course and a 
documentation handling experience. The structure of the meta-help window is an 
hypertext active document (see “Layer2” on the figure 2). 

The interaction between the three layers is not strictly planned before. During the 
practice of the task, the trainee can choose between acting directly on a flight-deck 
pushbutton (layer 1, fig. 2), checking an item on the checklist (top left window layer 
2, in fig. 2) or even asking the ITS to trace step by step this task (layer 3). 

Problem can occur when analyzing conflicts in reasoning steps and attempts to 
track possible issues that bridges the gap between the ITS and learner reasoning. In 
agent -based simulation, different agents play specific roles to achieve these goals [7]: 
at the first level, pedagogical agents (learner, tutor...) acts in different learning 
scenarios, at the second level, gap detector agents trace their reasoning and at the third 
level, cognitive evaluator agents detect and solve conflicts to improve new strategy. 

4.1 General Presentation of Mnlti-agent ITS 

Three main components of an ITS (the student model, the knowledge model, and the 
pedagogical model) have been formerly built in a form of the intelligent agent 
architecture as in the Actor's agent [7]. It is possible to limit the number of actors and 
the casting of roles by (i) viewing learning as a reactive process involving several 



^ AMM = Aircraft Maintenance Manual, TSM = Trouble Shooting Manual 
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partners (human, simulated functions, pedagogical agents...), (ii) adapting each 
advising agent to various learning strategies co-learner, advisor, . . . [7] . 



Layer 2: Task follow-up dashboard 

r ‘NOTE : You mnsf si ound the nil ei aft... 
r ' (11 On tlie p-.uiel 255MJ. make Mii e... 
r '(2> On tlie i>nnel 212MJ. make 
1. Reason foi the jol> 

5. Jol> Set-np 

Eneisize the aiiciaft electiical ciiL Uit.s (-■» TASK 24-41-0U-S61-SU11- 
1. Reason foi the jol> 

3. Job Set-up 

Subtask ^ 24-41-UU-4S0- 

1. Reason foi the job’ 




Correct operation 



Figure 2. Three- layered learner’s interaction during a maintenance task 



4.2 Typology of Pedagogical Agents in the Simnlation-Based Mnlti-agent ITS 

According to the “users in the loop” concept [8], general characteristics of the used 
agents used are the following: 

- Cognitive Agents: consider different strategies and learning styles, establish 
learning objectives, create, locate, track and review learning materials, e.g., 
diagnostic and assessment instruments, learning modules, mastery tests, etc..., 

- Planned Agents: register changes and review/track students' progress and manage 
student-tutors communications both asynchronously and synchronously, 

- Reactive Agents: assign appropriate materials to students, manage student-ITS 
communications synchronously (when needed) and evaluate student needs. 

The remaining problem is how to classify cognitive interactions amongst a society 
of cognitive agent acting together in shared initiative (or not) with the leamer(s). 



4.3 Classification of Cognitive Interactions 

We need to have agents, which mimic human behavior in learning situations. From 
the previous multi-agent ITS experiments, we have classified three levels of 
abstraction depending on the functional aspects of learner’s practice: 
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- (a-mechanism): learning as replication, where agents can provide instructional 
data, representation of pedagogical strategy, and one of them, the Tutor, is devoted 
to mimic the teacher acting in the classroom (learning is a reactive mechanism), 

- (b-mechanism): learning by tautology, where demonstrations can be designed to 
guide the learner through the learning process with the help of specialized agents 
as Tutor, Companion, Adviser... (learning as an adaptive or planned mechanism), 

- (c-mechanism): learning by dynamic interactions and shared initiative, where the 
computer is more active and information is not only provided by the system, but 
can be modified and generated by the learner (learning is a cognitive mechanism). 
At the second stage, which is the related current phase of the work, the ASITS 

architecture allows to detect in real-time the emergence of deviant behaviors and 
cognitive misses from the learner. What we do call cognitive gaps of the learner. 

4.4 Cognitive Gaps Typology: Dropping Gap & Context Gap 

However, the success of a pure multi-agent based tutoring system depends on the 
learner's motivation and self-discipline. We intend to profile such behavior by just 
using three types of cognitive gaps: -the 'context gap' at the points of divergence 
between the purpose of the tasks performed within an ITS and the purpose of the 
predicted solutions expected by the pedagogue (that needs a b-mechanism - the 
'dropping gap' (i.e., the rate of renunciation due to the lack of motivation and help) 
which implies a c-mechanism approach. Thereby, this method for weakening the 
"dropping gap" inevitably introduces the 'context gap' restraint jointly with the shared 
initiative problem between the learner and the system. The solution to reduce the 
dropping gap by agents’ auto-adaptation introduces often the context gap, which 
breaks the initiative share between the learner and the system. Such conflict limitation 
needs a specialized type of actor - the gap evaluator agent. 

4.5 Cognitive Agents as Gap Evalnators 

The “instructor assistanf’ plays the role of a collaborator and his help is more and 
more useful because he observes captures and generalizes decision help made by 
other agents. The learning of activities by agents was limited to two simple 
mechanisms: (i) learning by the needs satisfaction (individual), such as the agents of 
two levels (reactive and planned) with the planner agent for beliefs and presupposes, 
(ii) learning by satisfaction of contradictions (collective), which uses a genetic 
algorithm in the aim to resolve antagonist constraints between the evolution of each 
agent and the operation of the whole system. 



5 Architecture for Gap-Tracking in the Multi-agent System 

The following scheme displays the organization of different agents, which form a part 
(tracking system) of the general ITS architeclure based on ASITS principles. 
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Figure 3. Cognitive agents managing interactions with the learner 



5.1 Cognitive Agents Role 

Cognitive agents are present permanently in the environment: they are created at the 
launch of application and “live” until its end. Each is represented by one specimen. 

Learner’s gap detector agent supervises interactions of the learner with the system. 
It is based on the know-how model in order to detect each gap of the learner. This gap 
detection does not evaluate the gap (severity level of the error). 

Curriculum agent controls the progression of the learner within the whole course. 
Synthesizing different problems encountered by the learner, it is responsible for 
organizing learning sessions and for individualizing the progression in difficulties. 
Depositor of Instructor’s Experience agent collects preferences in order to guide the 
learner according to the personal instructor’s style. It must, on the demand of Gap 
Evaluator agent, analyze this gap and propose a heuristic issue for qualification. 



5.2 Reactive Agents Role 

Reactive agents have a different lifetime. They are created by another agent 
(cognitive or reactive) and are killed when their objective is completed. Depending on 
situation, each type of reactive agents is represented by 0 to n specimen. 

Gap evaluator agent is created by Learner’s gap Detector agent in order to find the 
real signification of the gap (negligible gap, notable, important, major error. . .). 
Learner Assistant agent offers requested help when on the learner’s demand. The only 
interaction of this agent with the environment is produced when the learner, after 
numerous indexes or helps, can’t correct his/her error. In this case. Assistant agent 
realizes, step-by-step, a demonstration of the correction. 
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The Learner’s Gap Evaluator agent creates observer of Instructor’s Heuristics agent. 
It uses machine-learning techniques in order to collect precise and heuristic 
interventions of the instructor but it stores also the acquired knowledge. 



6 Experimenting with KQML and CIAgent Agents [1] 

6.1 General Architecture for "gap detector" Agents 

The general architecture of an agent in CMOS is framed on six slots (Know-how, 
Curriculum Vitae, Beliefs, Acquaintances). Reactive procedures are coded in slots 
"Beliefs", "Acquaintances", planned functions in "Nature" and "Script", cognitive 
functions as knowledge bases in "know-how" and "Curriculum Vitae". 



Agent model Instance: Learner's gap evaluator Agent 



Know-how 


{Eval gap, Build correction, Query didactical ressource} 


Currie .vitae 


{ Eval gap. Build correction } 


Nature 


Aperiodic 


Script 


Evaluate the gap^ Instructor Agent, Compute final gap 


Beliefs 


{ ( Error level, 0.3 ), ( Back, 0.9 ), ( Jump, 0.2 ) } 


Acquaintances 


{ Gaps detector, Didactical ress manager, Curriculum. } 



Figure 4. General framework for ASITS agents 



6.2 Agent Classes 

The ASITS architecture supports autonomous agents defining a generic class of 
agent-supervisor named CIAgentS, which monitors six principal agents: 

- Ag.I: initialization agent (not presented in fig. 6) because it is not permanent 

- Ag. LGD: Learner's Gap Detector 

- Ag.DRM : Didactical Ressources Manager 

- Ag.IA : Instructor's Assistant 

- Ag.C : Curriculum 

- Ag.GE#n : Gap Evaluator 

In addition, these five agent types are cognitive; they are represented by a unique 
instance. The last agent is reactive, the whole system can have dynamically from 0 up 
to n agents of this type, and it perishes (Dead state) at the end of their script. 

The architecture presented needs really autonomous agents. However, the problem of 
autonomy was not completely resolved. But the agents developed by Tim Finin using 
KQML [13], give a correct answer to CMOS project’s design and enrich the basic 
CIAgent architecture. 

One of the remaining problems is the correct handling of what S. Cerri call access to 
"multiples contexts"[3]. In fact, each agent, before activate oneself, has to examine 



^ CIAgent = Constructing Intelligent Agent. 
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with "real autonomy" if the context needs or not its immediate engagement. As there 
is no effort in KQML to model agents with multiple viewpoints, we have to embeds 
some rules to determine when an agents takes the initiative (see § 6.3) 

The agents’ reasoning is constructed on the basis of reasoning rules. In the beginning, 
only a minimum set of rules is presented in the memory of the agent (agent applies his 
reasoning rules to the procedures). The instructor can get an agent to take in account 
him/her own particularities or training approach - the instructor can teach rules to the 
system, following the procedure and making consequently an action, which is still 
correct (from the instructor’s point of view) but doesn’t explicitly written in the 
procedure. The system detects an anomaly and requests the validation: whether the 
action performed was correct (to be recorded as a new additional rule in order to 
become more flexible for the learner) or not (just instructor’s mistake). After the 
validation, the new rule is recorded in the rule base and, when the learner repeats the 
same action, the system is ready to respond and doesn’t record this action as a mistake 
(such analysis is realized by Gap Evaluator Agent). 

A better solution to this problem can be currently carried on by using an interpreter of 
KQML messages in Scheme, as suggested in the STROBE model [3] and then 
compile the issued Scheme algorithm into Java code. 

Another pending problem is what we call the "any time" activation of agents in 
accordance with the different unsynchronized value of the three "real-time" referrals: 
real-time simulation, procedures tutoring and ITS evaluation of gaps. Real-time 
activation of agents can be embedded with a correct interoperability into the 
CIAagent version of the CMOS prototype by using the CTJ'* library but this package 
cannot handle different time referrals and we are restricted to the poor alternative of 
accessing multiple context to determine the correct activating time for each agent. 



6.3 Results 



At the beginning, the only initialization agent (Ag.I) is awake. Its only task consists in 
arousing other agents (LGD, DRM, lA et C) before going to slep itself. A series of creations 
is performed by the agent I. In response to each message ofd^ 'Njed 



receive the «Wake up !» signal : 

Ag.I: Create Ag.LGD 
Ag.LGD: Wake up ! 

Ag.I: Create Ag.DRM 
Ag.DRM: Wake up ! 

Ag.DRM: Didactical ressource chosen = AMM page 
Ag.I: Create Ag.IA 
Ag.IA: Wake up ! 

Ag.IA: Run mode = learner's mode 
Ag.I: Create Ag.C 
Ag.C: Wake up ! 

Ag.C: Storing 'the learner is starting a new exercise' 

Ag.I: Falling asleep 

New procedure 24-24-00-710-801 

6 0: Action: EXT A IMPULSED 



Actions: didactical 

ressources manager (DRM) 
chooses an AMM task, which 
will point to the associated 
didactical ressource in order to 
propose this exercise to the 
learner. 

Instructor's Assistant (lA) 
identifies the current mode of 
functioning - « learner mode », 
Curriculum (C) records the 
fact that the Ilearner begins a 
new exercise on the active 



«Communicating Threads for Java» [9] 
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Ag.LGD: Learner's interaction detec: 

10 0: Action: BAT 1 

Ag.LGD: Learner's interaction detected 
10 2: Action: APU BAT 

Ag.LGD: Learner's interaction detected 
6 0: Action: EXT A 

Ag.LGD: Learner's interaction detected 
Ag.LGD: Gap detected 
Ag.LGD: Create Ag.GE 




PU 



IMPd 



Ag.GE 

Ag.GE 

Ag.GE 

Ag.GE 

Ag.C: 




Wake up ! 

Gap evaluation 
Gap = Learner's error 
Notifying learner's error to Curriculur] 
Storing learner's error 



: ^ 

Normal learner's progression in the current ' 

task. Only the Learner’s Gap Detector agent 
(LGD) produces internal notifications at each 
learner’s action (it shows clearly the transparent 
real-time learner’s follow-up). But in 6.0, the 
learner makes a gap related to the normal 
orocedure. 



(LGD remarks it, and then creates a Gap 
Evaluator agent (GE). GE qualifies the gap as a 
learner’s mistake. GE notifies curriculum of it. 



Ag.GE: Dead 

6 0: Action: EXT A 

Ag.LGD: Learner's interaction detected 
Ag.LGD: Gap detected 
Ag.LGD: Create Ag.GE 



IMPULSED 



Wake up ! 

Gap evaluation 



Gap = Learner's correction of the error 



Immediately after making a mistake, 
the learner is informed on it and tries to 
correct this error at once 




Ag.GE: 

Ag.GE: 

Ag.GE: 

Ag.GE: Notifying learner's correction to Curriculum Agent 
Ag.C: Storing learner's correction 
Ag.GE: Dead 

10 1: Action: BAT 2 PUSHE 

Ag.LGD: Learner's interaction detected 
2 2: Action: UPPER potentiometer 

Ag.LGD: Learner's interaction detected 
2 3: Action: LOWER potentiometer 

Ag.LGD: Learner's interaction detected 
4 2: Action: HYD/GREEN/ELEC 

Ag.LGD: Learner's interaction detected 
Ag.LGD: Gap detected 




Superieur 

Superieur 

IMPULf 



The rest of this chronicle shows 
the normal progression of learner 
before he makes a new mistake in 
4.2 (pushing start button of 
hydraulic pump 

HYD/GREEN/ELEC, which action is 
outrun by the learner). The 
hypothesis to verify is whether this 
is a context gap 



Ag.LGD: Create Ag.GE 



Ag.GE 

Ag.GE 

Ag.GE 

Ag.GE 

Ag.C: 



Wake up ! 

Gap evaluation 
Gap = Learner's error 
Notifying learner's error to Curriculum A 
Storing learner's error 



Ag.GE: Dead 

4 2: Action: HYD/GREEN/ELEC 

Ag.LGD: Learner's interaction detected 
Ag.LGD: Gap detected 
Ag.LGD: Create Ag.GE 




IMPULS 



Ag.GE 

Ag.GE 

Ag.GE 

Ag.GE 

Ag.C: 



Wake up ! 

Gap evaluation 

Gap = Learner's correction of the error 
Notifying learner's correction to Curriculum Agent 
Storing learner's correction 



The same mecanisms as 
in the action 6.0 are 
launched in order to guide 
always the learner as 
efficiently as possible. But 
this time, it doesn’t work, 
because the learner is 



Ag.GE: Dead 
1 0: Action: 



Ecam 



C/B 
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Ag.LGD: Learner's 

3 2: Action: 
Ag.LGD: Learner's 
2 0: Action: 
Ag.LGD: Learner's 
0 0: Action: 
Ag.LGD: Learner's 
0 1 : Action: 
Ag.LGD: Learner's 
0 2: Action: 
Ag.LGD: Learner's 

4 2: Action: 
Ag.LGD: Learner's 
4 2: Action: 
Ag.LGD: Learner's 



interaction detected 
Beam C/B 

interaction detected 

Beam HYD 

interaction detected 
Low air press BLUE 
interaction detected 

Low air press GREEN 
interaction detected 
Low air press YELI^ 
interaction detected 
HYD/GREEN/EH 
interaction detected 
ACCESS BUS SHED 
interaction detected 




After he sees that it 
doesn’t work, the learner tries 
to make a series of actions 
(10,3.2,2.0, ) which guides 
him to a new attempt by 
trying 4.2 after 02 and... this 
time, it is correct: bus access 
is trusted (ACCES BUS 
SCHED) because the 
spontaneous previous actions 
replaced him in the good 
context 



Error indication has just 
disappeared. It can be classified as a 
“context gap”. This fact may be 
validated by the instructor during the 
debriefing. 



6.4 Example of Cognitive "know how" for the Learner Gap Detector Agent 

Contextual gap detection is based on the idea that an agent should have explicit 
knowledge about contexts in which it may find itself, then use that knowledge when 
acting in those contexts. In our approach, this knowledge is represented as contextual 
schemas (Turner, 1994). Each contextual schema (c-schema) is 2-uple : 

<TL=Task List> : <LA=Learner ' s situated Actions> 

Therefore, the local detection of a learner gap is restricted to a given context 
<TL:LA>, more generally <c', c> is evaluated by a logic function f . This context is 
evolving in real-time and must be evaluated "any time" (i.e. at each learner action). 
Let c, c' a n-uple of distinct contexts and f as a functional evaluation of action in 
context : 

<c', c>: f f true for c is true for o' 

Let a= with Vi e (1, ,n) e C 

ois a sequence in the set of contexts E constructed as symbolic expression on C (set 
of context symbols). A language to identify sequences (as contexts) can be 
defined by: 

if (|)GLp,OeE — > 0:(|) 

A context change is denoted by: 



q<: t ^ t 
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with e E, Oy G E, and (|)^ , (\i^e Lp^ 

and the associate functional expression of the corresponding c-schema is 

O: t = e(0: (|)^ ) 

A contextual learner's behavior can be represented by n changes in the context 
b = (el, e2,...en) 

To compare and evaluate these changes, an LGD agent can uses three basic axioms 
and three logic functions : 

- Axioms : 

1 - o : (|) 

2 - O: ist (X,(|) ( ist (X,(|) ^ (X,t|/) ) 

3 - O: ist (%(|)) ^ -1 ist (%^(|)) 

- Logic functions : 

Modus Ponens 

|-o: (|) | - o: (|)^\|/=>|-o: \|/ 

Rule as precondition to enter into a new context : 

I - <Xl, ,Xn> ■■ ist (%,(|)) ^ <Xl, ,Xn, X> ■■ 

Rule as postcondition to quit an identified context : 

<Xi, ,Xn, X> : (|) ^ I - <Xi, ,Xn> : ist (x,(|)) 

Equipped with this basic knowledge an LGD agent detects the following changes : 
For Action 10 1 in § 6.2 : 

<cl, c2>: BAT2 state unknown ^<cl, c2>: BAT2 pushed 
(No learner's error or gap detected) 

For Action 4 2 : with context (1 0, 3 2, 2 0, 0 0, 0 1, 0 2) 

The context change detected by the LGD agent is 

C : HYD/GREEN/ELEC State unknown : HYD/GREEN/ELEC 

impulsed 

(No learner's error or gap detected) 

Continuing 4 2 but with changed context o = (1 0, 3 2, 2 0, 0 0, 0 1, 0 2, 4 2) 

The context change detected by the LGD agent is 

: ACCESS BUS none : ACCESS BUS SHED 

x+1 x+1 



^ Propositional logic 
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(This correct action is error free and allows the learner to access data from the 
simulator for the next task) 



7 Conclusion 

We began the experiments of this prototype, at first with the senior engineers and 
instructors all along the development cycle. Nowadays, we make them with novice 
learners: students from the Institut de Maintenance Aeronautique at Bordeaux and 
from the Institut Universitaire de Techno logie at Bayonne with the aim to classify the 
different cognitive task loads during learner’s interactivity. 

Depending on three functional aspects of learning identified as a, b, c mechanisms 
and with a rather primitive cognitive gaps typology (Tunnel Effect, Dropping, and 
Context Gaps), we have shown what aspects of a user's behavior are possible to be 
monitored by a gap evaluator agent. 

Implemented with a Java repository of agents (CIAgent), the deliberately limited 
actor’s architecture, and the Gap Evaluator agent. This Gap Evaluator agent is 
responsible for qualifying the gap: from "nothing important" to "major 
misunderstanding" for a given type of gap - dropping, context or tunnel - in a, b, c 
mechanism). 

The major scientific obstacle consists in identifying learner’s behavioral aspects, 
which can be captured, controlled and learned by different cognitive (or not) agents in 
shared initiative between the learner and the system. 

We plan to extend rather poor capabilities of learning within the current cognitive 
agents. Another promising way to investigate is the improvement of man-machine 
interaction by immersing the learner in virtual reality interfaces. This may originate a 
new spread of influent cognitive discrepancies or shifts that need to identify new 
types of distinctive gaps. 
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Abstract: Sophisticated techniques from various areas of Artificial 
Intelligence can be used to improve the access to the WWW; the most 
promising ones stem from Data Mining and Knowledge Modeling. We 
describe the process of building two experimental systems: the VSEved 
system for intelligent meta-search, and the VSEtecka system for 
navigation support. We discuss our experience from this process, which 
seems to justify the hypothesis that the Multi-Agent paradigm can 
improve the efficiency of web access tools, in the future. With this 
respect, we outline a web-oriented multi-agent architecture. 

Keywords: WWW Access, Data Mining, Knowledge Modeling, Meta- 
Search, Navigation Support, Agent Architecture. 



1 Introduction 

During the last few years, the World-Wide Web has become one of the most 
widespread technologies of information presentation. Making the enormous amount of 
information on the web really useful is inconceivable without intelligent assistance, 
both for end users and for maintainers of large sites. From this point of view, we can 
distinguish two basic groups of tasks that are frequently attacked by web applications: 

The most important user-oriented tasks are probably: 

1 . search (retrieval) of relevant documents, using one-shot queries 

2. filtering a stream of new documents against stable profiles 

3. navigation, i.e. support for the user during the browsing session 

4. question-answering, i.e. extraction of relevant low-grained data relevant to the user 
questions (see e.g. [Gaisauskas, Humphreys, 2000]); a particular form of question- 
answering is passage retrieval, which is rather similar to document retrieval. 

As maintainer-oriented tasks, we can view especially: 

1 . overall site auditing and maintenance 

2. low-level marketing tasks such as market-basket analysis of customers’ access. 



S. A. Cerri and D. Dochev (Eds.): AIMSA 2000, LNAI 1904, pp. 167-178, 2000. 
© Springer- Verlag Berlin Heidelberg 
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A number of techniques can be used to solve these tasks (cf e.g. [Chakrabarti, 
2000]): statistically-grounded information retrieval methods, computational linguistics 
and natural language processing methods, as well as more ad-hoc artificial intelligence 
(Al) techniques. Within AI, the dominating paradigms seem to be those of data 
mining and knowledge modeling. 

1.1 WWW and Data Mining 

Zaiane and Han [Zaiane, Han, 1998] gave a nice taxonomy of data mining in the 
web environment, to say, web mining. They distinguish web content mining, web 
structure mining and web usage mining. 

The goal of web content mining is to extract knowledge from the web pages itself 
The tasks related to this goal can be: 

- web search and meta-search (find pages relevant to the user’s query), or filtering 
(recognize pages relevant to the user’s profile); this is the question of information 
retrieval, 

- text mining (find knowledge “hidden” in the pages); this is the question of 
information extraction or question answering. 

While the goal of information retrieval is to find relevant pages (strictly speaking, 
to find a set of pages with high precision and high recall), the goal of information 
extraction is to extract information from these pages. Text mining can be applied e.g. 
for discovering associations in collections of textual documents [Feldman, 1997]. 

Web structure mining means extracting knowledge from web structure and 
hyperlinks. An observation has been made that web space is not homogeneously 
interconnected. There are pages (called hubs) pointing to a large number of other 
pages; there are pages (called authorities, i.e. referential for some areas of interests) 
pointed to by large number of pages. Doing web structure mining we can e.g. look for 
some regular patterns of links between such types of pages [Tomkins, 2000]. The 
information about the structure can be useful for navigation. Methods from the graph 
theory are suitable for performing such kind of analysis. 

The goal of web usage mining [Srivastava, 2000] is to discover access patterns 
(form web server logs) and find paths frequently traversed by users. This task is very 
similar to market basked analysis performed in standard data mining; what are the 
goods (pages) frequently purchased (visited) by the customers. The results of web 
usage mining can be used in marketing (Amazon uses such approach when 
recommending similar books) and for reorganizing the web site (pages frequently 
visited during one log should be linked together). 



1.2 WWW and Knowledge Modeling 

The hot topics in up-to-date knowledge modeling are ontologies and problem- 
solving methods, namely their construction, sharing and reuse. 

The Web environment, containing a huge, diverse collection of textual documents, 
is highly favorable for experiments in ontological engineering. Most research has 
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concentrated on the development of ontologies (hierarchical collections of concepts, 
their relations and other elements) as a background for semantic unification of the web 
content, in particular of metadata. In addition to „terminological“ ontologies (or, 
thesauri) used by digital libraries to annotate documents (such as the Dublin Core set 
[Weibel et ah, 1998]), more sophisticated „knowledge“ ontologies ' have emerged as 
a results of efforts within the AI community. The two seminal projects most widely 
discussed are SHOE [Luke et ah, 1997] and Ontobroker [Fensel et ah, 1998]. The 
central issue of both is the use of ontologies by authors of web pages, in specifying 
knowledge annotations (i.e. richly structured metadata) for the pages and their parts; 
the difference between them lays in the construction and management of ontologies 
(distributed in SHOE, centralized in Ontobroker). Ontologies are also used as a 
conceptual grounding for factual knowledge bases, describing the (interesting parts 
of) the web at the level of instances, for inferential knowledge bases ^describing 
recurring patterns in web data, as well as for data mining tasks aiming at discovery of 
inferential knowledge. 

In contrast, hardly any attention has been paid to web-specific problem-solving 
methods. In the project outlined in this paper, we plan to fill this gap via defining 
skeletal action plans for web access tasks, which will be refined and executed part-by- 
part by different agents in a multi-agent environment. 

There is a number of single-purpose systems oriented on particular tasks described 
above. In our paper we present a slightly different attempt of using multi agent 
architecture to solve more tasks simultaneously. 

The paper is organized as follows. Section 2 describes the experimental VSEved 
system developed for „intelligent“ information retrieval (using the meta-search 
technology). Section 3 describes another system named VSEtecka - a browsing 
assistant, which is currently being developed in order to support the navigation in the 
web space via a collection of meta-information and links to associated pages. Section 
4 compares different tasks with respect to the (largely overlapping) input data they 
require. Finally, section 5 discusses the pros and cons of agent architecture, and 
suggests an agent-based model for the WWW information access, and section 6 
summarizes the whole work. 



2 The VSEved Meta-search System 

2.1 Search and Meta-search 

Due to the enormous growth of the web, finding information about a specific topic 
can be extremely demanding. Search engines attempt to automate this task by means 
of building (off-line; manually - e.g. Yahoo, or automatically - e.g AltaVista) inverted 
indices of WWW pages, which can be then searched according to words/phrases given 
by the user. However, the use of search engines themselves entails significant 



* See e.g. [Uschold, Gruninger 1996] for an elaborate typology of ontologies. 

^ See e.g. [Harmelen, Fensel, 1999] for a discussion on conceptual, inferential and factual 
knowledge on the web. 
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difficulties for a common (inexperienced) user. There is a vast number of search 
engines at different locations, each with its own way of user interaction - the user thus 
has to know which one to choose, how to reach it and how to utilize it. Moreover, 
each engine is constrained by its own index, which usually covers only a small 
fraction of the whole WWW space. 

The idea of WWW meta-search has subsequently emerged to help users to find 
more relevant information in a more convenient way. The essence of all meta-search 
systems consists in giving access to more than one search engine, while their other 
features may vary. The typical models of meta-search are as follows: 

- the user selects himself which search engine is to be queried (e.g. in the All-in-One 
system: www . albany . net/ allinone) 

- the system itself queries all accessible search engines (e.g. the MetaCrawler system 
[Etzioni, 1997]) 

- the system itself selects the most promising search engines to be queried (e.g. the 
SawySearch system [Howe, 1997]) 

- the system queries both its local database of frequently asked questions, and some 
remote search engines (e.g. the AskJeeves system: www . askj eeves . com). 

As the main advantages of meta-search we can view: 

- simultaneous submission of the query to different search engines 

- exploitation of search engines possibly unknown to the user 

- single interface on the user’s side 

- merging and sometimes further post-processing of returned information (lists of 
hits). 



2.2 Overview of the VSEved System 

We have developed an experimental system named VSEved, which combines usual 
WWW meta-search with Artificial Intelligence techniques. In the former, VSEved has 
been inspired mostly by AskJeeves, in particular in 

- directing the queries both to (multiple) remote search engines and to a local 
database of „direct answers“ 

- linguistic preprocessing of the query, which can be written in natural language. 

The local database of VSEved’s answers contains links to pages judged as 

interesting with respect to the particular community of the users of the system (it has 
been used mainly by university campus users). Linguistic preprocessing in VSEved 
consists in language recognition (English or Czech) simple lemmatization (for Czech), 
and extraction of linguistic Boolean operators (for Czech). 

In addition, the hit lists returned by search engines are post-processed by means of 
a rule-based expert system, which accounts for their „cleaning“ (removing duplicities 
and dead links), integration, re-ordering (according to a „quality“ criterion based on 
query-term coverage) and structuring, in order to provide more concise and 
informative output to the user. Unlike other document-structuring systems that 
perform unsupervised clustering [Zamir, Etzioni, 1998], [Honkela et al., 1996], we 
have decided to use assignment to pre-defined categories (page types). Typological 
categorization is one of the display options of the system, besides linear ordering 
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according to quality and (URL-) domain-based grouping. The process of typological 
categorization will be described in more detail in the next two subsections; more 
details of the VSEved systems can be found in [Berka et ah, 1999]. 



2.3 VSEved and Knowledge Modeling 

The design of VSEved’s typological categorization has been grafted on previous 
knowledge modeling (ontological engineering) efforts. An ontology named WEB- 
ONT, covering different aspects of the web (websites, logical documents, physical 
pages, tag structures, addressing etc.) had been elaborated [Simek, Svatek, 1998] and 
implemented in several languages (inch e.g. SHOE). A part of this ontology, dealing 
with page typologies, has been reused for constructing the categories that could be 
assigned to the search hits retrieved (and thus to the pages referenced, respectively). 
The three typologies are, in turn: 

- Bibliographic categorization, operating on concepts like „article“, „bibliography“, 
„image“, „pricelist„ or „newsgroup message“. This categorization was essentially 
borrowed from the existing Dublin Core metadata system [Weibel et al., 1998] 
(element ResourceType), which had been previously embedded into the WEB- 
ONT ontology. 

- Categorization according to the sphere of origin, such as „commercial“, 
„academic“, „govemmental“, „non-profit“ or „private“. 

- Categorization according to technological type, such as „plain text“, „form“ or 
„index“. 

The operational rulebase (written in the CLIPS language) capable of recognizing 
(to a certain extent) these categories, has been constructed using, essentially, data- 
mining approaches. 



2.4 VSEved and Data Mining 

In the data-mining part of the VSEved project, we have attempted to build a 
rulebase relating web document types to the following information: 

- Terms from and structure of the URL - this part of the rulebase is generally 
applicable (i.e. not only for meta-search, but also for navigation, filtering etc.). 

- Other information returned by search engines (name, size, date and textual 
„snippet“ of the page) - this part is specific for the meta-search task. Using this 
information can lead to reduction of ambiguity in the categorization. 

Due to the structural nature (overall hit structure, plus linear structure of the URL, 
name and „snippet“) of the data. Inductive Logic Programming (ILP) seems to be a 
good choice for the learning approach. However, due to the high computational 
complexity of ILP, we have decided to use 

- fast and straightforward frequency analysis of terms, pairs of terms and specific 
symbols from a large set of URLs, in the overall type-assignment task (the details 
of the frequency-analysis process can be found in [Svatek, Berka, 2000]) 
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- sensitive but costly ILP, in the specific subtasks for which frequency analysis alone 
lead to ambiguous models - this was the case e.g. for the URL terms „art“, „pub“, 
„cat“ or „bio“. 

The overall rulebase thus consists of two layers: pure URL -based rules (applicable 
for various categorization tasks) and rules encompassing general features identifiable 
in search results (applicable for the meta-search categorization only). We have shown 
in [Svatek, Berka, 2000] that even the first layer, consisting of a few dozens of pure 
URL-based rules, can more-or-less successfully assign some generic category to 
approx. 70-90% (depending on the language and query-specificity settings) of pages 
retrieved by search engines; 25-50% of the assignments account for Dublin-Core-like 
bibliographic categories. Future experiments will show the impact of the newly- 
introduced, ILP-based disambiguation on these figures. 



3 The VSEtecka System for Navigation Support 

3.1 Searching vs. Browsing 

Today's web search engines clearly separate the phases of searching and using (i.e. 
reading pages) information stored in „Web Space“. You must specify your query to 
search engine as much precisely as you can. Search engine returns you list of pages 
which may be of your interest. Then you manually browse through this result. If 
founded pages are not exactly on your subject, you must go back to search engine, 
precise your query and hope that search results would be closer to your expectations. 

Above described approach is used by most of today's search engines. But if we 
tighter integrate intelligent search process with browsing, we will get environment 
which is much more effective, user-friendly and intuitive for users. Our aim is to 
develop navigation support system VSEtecka which will provide useful information 
related to actually viewed page. 



3.2 Navigation-Support Information 

After finishing VSEfecka will provide the following information in an easily- 
accessible way: 

Meta-information about the current page: Useful information related to the 
current page alone, such as like author, title, keywords and so on. The meta- 
information set of VSEtecka will also include the same information as delivered by 
the VSEved meta-search system, see previous section. 

Links to similar pages: The links to similar pages are categorized according 
several similarity criteria, such a content similarity, structural similarity and so on. 
This functionality will allow to traverse similar pages without the need for explicit 
web search service. 

Associations: Almost every web page is a part of a larger document or web-site. 
Not all pages are well designed and, as a result, they do not contain very helpful 
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navigation links. This part of the system will automatically provide the links (if 
appropriate) to the previous and next page in a sequence, to the table of contents of 
the document, to a relevant home page, and so on. This should prevent the user from 
getting lost on badly designed sites. 

User definable filter: Filtering is an effective way of narrowing the scope of pages 
offered in the similar and associated pages’ sections. Only the pages which fit to 
restrictions set on the author’s name, page title, subject etc. (basically, any of the 
meta-information fields can be used here) will be supplied to the user. 

Domain-specific information: In many domains, specific page categories and 
relations among them are important for the user’s understanding of the site structure, 
in addition to generic ones (which are listed under „associations“). This may be the 
case e.g. for course, lecturer and department pages in an academic environment (see 
e.g. [Craven et al., 1998]), or for company homepage, pricelist and press-releases’ 
page in a business environment. 



3.3 Notes on Implementation and Data Inpnt 

For the system to operate conveniently on-line, the responses to user’s actions must 
be fast. Navigation information on the pane has to be updated in a few seconds after a 
new page has been loaded into the browser. This makes the option of collecting 
necessary data on-line for each request rather problematic. Instead, we assume that the 
system will mostly rely on a web-crawler that will scan a pre-defined subspace of the 
web: currently, we index the pages on (a huge number of) web-servers within the 
campus of the University of Economics, Prague. The information will be extracted 
from this pages using a synergy of Al methods, and stored in a factual knowledge-base 
(FKB), analogical to the knowledge base currently built within the WebKB project. 
The existence of the FKB will guarantee fast response to requests from the users 
aiming within its scope. For the requests outside its scope, the system will either have 
to perform on-line document retrieval and analysis, or will rely on limited information, 
such as URLs and anchor text of links (see [Svatek, Berka, 2000]) in a similar way as 
the VSEved system. 



4 Search, Navigation and Beyond: Overlap of Web-Mining 
Tasks 

The experience with the running prototype of the meta-search system, as well as the 
design of the (not finished yet) navigation support system have led us to the 
conclusion that the requirements of input data vary but significantly overlap for 
different web-related tasks. Let us enumerate the most characteristic information 
resources, as identified by us and others (e.g. [Chakrabarti, 2000], [Craven et al., 
1998]), in the rough order of increasing grain-size: 

- Elementary data types, which (in particular, the first two) appear in different places 
and situations around the „web-world“; these mainly amount to 
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1 . text in a natural language 

2. URLs. 

3. images and other multimedia information 

- Static data structures: 

1. tag structure of pages, expressing (namely, HTML in the common way of use) 
display formatting and/or logical structure 

2. explicit, more-or-less standardized metadata on pages 

3. frequency count of linguistic terms, tag structures and non-text objects, as well as 
more abstract concepts (such as „personal names“ or „downward links"), within 
pages 

4. topology on the space of pages, induced by interconnecting links 

- Dynamic data structures: 

1 . user-access data (web server or proxy logs) 

2. search queries and their results 

Let us then match the data with the (somewhat refined) list of web-related tasks 

from section 1 : 

- Meta-search with „limited information" post-processing, as performed by the above 
described VSEved system, uses URLs, page titles, and, possibly, short (usually 
almost worthless from the point of view of linguistic analysis) text snipped from the 
beginning of the page. 

- Common search engine indexing, as well as some other off-line tasks such as 
filtering, process page frilltexts, thus yielding word frequencies, but possibly also 
linguistic, HTML and link structures; some search engines also rely on information 
about the users’ choice. 

- Navigation support as well as all site-maintenance oriented tasks put stress on the 
link structure; the latter are often based on usage mining, i.e. exploitation of user 
access data. 

- Marketing-related tasks always require user access data. 

The overlap of lower-level data analysis tasks is, in our opinion, a strong incentive 

for the development of modular (agent) architectures for web analysis and access. 



5 Analyzing and Accessing the Web Using Agents - Discussion 

Earlier in this paper, we have mentioned several powerful techniques which make it 
possible to exploit information hidden in the web more effectively. Somewhat 
surprisingly current search engines incorporate quite a few of them. The most often 
used method for web search is still the ancient word or term indexing. Current success 
of Google search engine [Chakrabarti, 2000] (in particular gaining Altavista users) 
proves that using more sophisticated techniques can provide a competitive advantage. 
The following reasons may slow down the implementation of modem algorithms: 

- Each technique may require different data representation 

- User still expects integrated result (therefore the result aggregation must be 
implemented) 

- Sophisticated algorithms are likely to require a lot of computer power 
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How can we build an integrated, powerful and efficient system? We are constrained 
by the need of several different data representations. For example we use a word index 
for classical fulltext search but a kind of an associative net for storing information 
about web structure. It is difficult to predict all possible representations required in the 
future. For each data representation we require a different kind of processing. This led 
us to consideration of many agents, each specialized for a specific web searching or 
mining task. As a first step we plan to develop a multi-agent architecture which should 
enable: 

1 . Bottom up development model 

2. Possibility of gradual evolution of the system and easy adding of new features 

3. Natural decomposition of the system based on data representation 

4. Communication framework for query decomposition and result aggregation 

In addition we believe that competition or cooperation of the specialized agents can 
bring better results than a single process, single representation system. 
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Schema 1: current preliminary version of the multi-agent system 

Basically there are several classes of agents: 

1 . Data harvesting / preprocessing / storing in a standardized format 

2. Query decomposer and dispatcher / result aggregator 

3. Analytical agents (performing specific tasks) 

4. Helpers (performing useful common tasks like linguistic analysis) 

As we want the analytical modules to focus just to their task we provide a 
standardized data format. For example all harvested web pages are checked for errors 
and converted to canonical XML. The meta-information about pages is also saved in 
the pre-defined format. In the future the web-harvesting module should also serve as a 
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cache for requested documents which are temporary inaccessible (which is not a rare 
case). 

For the same reason we develop helper agents which should resolve common 
requests from analytical agents e.g. a specialized text parsing or linguistic processing. 

The analytical agents can be classified by a generality of task for which they are 
responsible. For example the structure-analyzing agent can solve rather wide area of 
tasks (e.g. find the nearest neighbors by mutual hyperlinks or by similar hyperlinks to 
other pages). On the other hand it should be possible to add an agent for such a 
specific task like answering who is the chair of the department X of the university Y 
(which can be implemented as filling this pattern by standard methods of IE). 
Currently we are finishing specifications of the analytical agents which should be 
implemented first to check the agent model features. A brief example of basic agents 
is included further in this paper. 

One of the most difficult parts of implementing the suggested multi-agent system 
will be designing the agent responsible for communication with a user. This task 
represents understanding the user query and dispatching it to the right analytical 
agents. On the other hand the results of the specific agents must be aggregated to the 
qualified answer. We believe that quality of answers is the most important goal of the 
described system. 

Examples of analytical agents 

- Word indexing agent - the basic agent which can answer questions like: What 
documents do contain the word X? Which document is the most relevant to the 
word X? In the most basic version this is nothing new but it will serve mainly for 
testing the communications among agents. In the future this agent will be upgraded 
to work with terms or possibly concepts and will use the text parsing and linguistic 
agents. 

- Structure analyzing agent - will answer questions like: What pages are near to this 
page in terms of number of clicks needed to get there? What pages cite the same 
addresses as this page does? This agent should store the web structure in the form 
of multidimensional hypergraph. Its answers can be used for example to sorting 
final results by number of citations. 

We anticipate some drawbacks of suggested solution. In particular: 

1 . The agent communication leads to overhead in processing time 

2. The different data representations lead to overhead in data store 

3. It is difficult to make global optimizations of query processing 

We expect that the possible problems will arise as soon as we start to test the 
minimal prototype with just a few analytical agents and the basic dispatcher agent. We 
expect that most of the problems can be solved and the system can be tuned up in this 
minimal setting. We plan exploit the experience to design maximally robust and 
efficient architecture which could be used as a framework for building powerful 
search engines and a testbed for testing modem and sophisticated AI techniques in 
exploiting the information richness of the Internet. 
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6 Conclusions 

We have analyzed the problem of improving the WWW access using AI 
techniques, with stress on most common web-related tasks. For two of these tasks we 
have designed experimental systems: the VSEved system for intelligent meta-search, 
and the VSEtecka system for navigation support. We discuss our experience from this 
process, which seems to justify the hypothesis that the Multi-Agent paradigm can 
improve the efficiency of web access tools. 

In the future, we would like to verify this hypothesis via implementing such 
modular architecture, which will embed also the components for the above mentioned 
meta-search and navigation systems. 
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Abstract. With the growing popularity of the World Wide Web, the 
number of semistructured documents produced in all types of organiza- 
tions increases at a rapid rate. However the provided information can- 
not be queried or manipulated in the general way since, although there 
is some structure in the information, it is too irregular to be modeled 
using a relational or an object-oriented approach. Nevertheless, some 
semistructured objects, for the same type of information, have a very 
similar structure. In this paper we address the problem of finding such 
regularities and we propose a general architecture based on a very effi- 
cient data mining technique. 



1 Introduction 

With the growing popularity of the World Wide Web (Web), the number of 
semistructured documents produced increases at a rapid rate. While in classi- 
cal database applications we first describe the structure of data, i.e. type or 
schema, and then create instances of that type, in semistructured data, data 
has no absolute schema and each object contains its own structure [Wor97]. 
Nevertheless, some semistructured objects, for the same type of information, 
have a very similar structure. Analysis of such regularities in such semistruc- 
tured objects can provide significant and useful information for restructuring a 
Web site for increased effectiveness [WL98] , for improving any meaningfull query 
for Web documents [KS95], for providing a guideline for building indexes and 
views [Abi97], etc. 

The groundwork of the approach presented in this paper is the problem of mining 
structural regularities in a large set of semistructured objects extracted from the 
Web. More precisely, we want to discover, using data mining techniques, graph 
structures appearing in some minimum number of objects. We have to pinpoint 
that this approach is very different from those on extracting the structure of a 
single individual object since we consider in the following that we are provided 
with a large collection of graph structures [Wor97]. 

The rest of this paper is organized as follows. In section 2, the problem is stated 
and illustrated. The approach is described in section 3. Related work is briefly 
presented in section 4. Finally section 5 concludes the paper. 
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2 Problem Statement 

In this section we give the definition of the structural association mining 
problem. First we formulate the semistructured data which widely resumes 
the formal description of the Object Exchange Model (OEM) defined for rep- 
resenting structured data [AQM+97,ABS00]. Second we look at the structural 
association mining problem in detail. A concrete example is also provided. 



2.1 Preliminary Definitions 

The data model that we use is based on the OEM model designed specifically 
for representing semistructured data. We assume that every object o is a tuple 
consisting of an identifier, a type and a value. The identifier uniquely identifies 
the object. The type is either complex or some identifier denoting an atomic 
type (like integer, string, gif-image, etc.). When type is complex then the object 
is called a complex object and value is a set (or list) of identifiers. Otherwise the 
object is an atomic object, and its value is an atomic value of that type. As we 
consider set semantics as well as list semantics, we use a circle node to represent 
an identifier of a set value and a squared node to represented an identifier of a 
list value. We can thus consider an OEM graph as a graph where the nodes are 
the objects and the labels are on the edges. In this paper we assume that there 
is no cycle in the OEM graph. 

We also require that: (i) identifier(o) and value(o) denotes the identifier and 
value of the object o; (ii) object(id) denotes the unique object with an identifier 
id; (iii) Each atomic object has no outgoing edges; (iv) if two edges connect the 
same pair of nodes in the same direction then they must have different label. 
We thus assume that we are provided with a labeling function Fe : E ^ Le 
where T_e is the domain of edge labels. 

Figure 1 shows a segment of information about a collection of three 
persons [ABSOO]. Each circle along with the text inside it represents an object 
and its identifier. The arrows and their labels represent object references. For 
instance value(Ezo5) = “Mary” and value(Ezo2)=” setof” , standing that name, 
age, child and child may be unordered. The stuctures are thus composed in the 
following way: 



{person : 


&ol 


{name : 


Mary, 


person: &o2 {name: 


John, 






age: 


LO 


age: 


17, 






child : 


&o2 , 


child 


: &o2 , 






child : 


&o3} , 


relat 


ives: {mother: &ol. 












syster: &o3}}. 


person : 


&o3 


{name : 


Jane , 







country : Canada, 
mother: &ol} 



} 
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Fig. 1. An OEM graph 



A single path in the graph is an alternating sequence of objects and labels 

< 0ili02---0k-ilk-i0k > beginning and ending with objects, in which each label 
is incident with the two nodes immediately preceding and following it. Such a 
path pk is called a path expression. The number of labels from the source object 
to the target node in a path, fe, is the length of the path. As we consider nested 
structures, we can consider that the length is similar to the nested level of the 
structure. Let Pk the set of all paths p where the length of p is k. We now 
consider multiple path defined as follows: a multiple path expression (or path for 
short) is a set of single paths such as the source object is the same in all the 
single paths^. The length of the multiple path is the maximal length of all single 
paths. As we are only interested in structural regularities, in the following we do 
not consider atomic values anymore and we use symbol T in order to denote an 
atomic value in the graph. 

Example 1 Let us consider figure 1 . From the object &o 5 , we have the following 
path expression: { name : T, age : T, child : {name : T, age : T}, child : { 
name : T, country : T}}. 

A multiple path expression pm is a sub-path expression of another multiple 
path expression pn if every object of Pm is included in pn where the inclusion 
is defined as follows. If the object is a set value, such as {xi,...xi\ C p^ and 
{x'l, C p„ then the object of Pm is included in the object of Pn if and 

only if every Xi is a subset of some x'y If the object is a list value, such as 

< x\,...xi >C and < x{,...x'f. >C p„ then the object of Pm is included 

^ In fact, we may consider that a multiple path expression is an OEM graph where 
the root of the graph is the source object of single paths embedded in the multiple 
path expression. 
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in the object of Pn if and only if there exist integers i\ < i 2 < < in such 

that xi C X2 C ... X2 C x'^. Furthermore we assume that an atomic 
value is included in itself. 

Let DB be a set of transactions where each transaction T consists of 
transaction-id and a multiple path expression embedded in the OEM graph and 
involved in the transaction. All transactions are sorted in increasing order and 
are called a data sequence. Figure 2 gives an exemple of transactions embedded 
in DB. A support value {supp(s)) for a multiple path expression in the OEM 
graph gives its number of actual occurrences in DB. In other words, the support 
of a path is defined as the fraction of total data sequences that contain p. A data 
sequence contains a path p if p is a sub-path expression of the data sequence. 
In order to decide whether a path is frequent or not, a minimum support value 
(minSupp) is specified by user, and the multiple path expression is said frequent 
if the condition supp(s) > minSupp holds. 

Problem statement: Given a database DB of customer transactions the 
problem of regularity mining, called Schema mining, is to find all maximal 
paths occurring in DB whose support is greater than a specified threshold 
(minimum support). Each of which represents a frequent path. 

From the problem statement presented so far, discovering structural as- 
sociation sequential patterns resembles closely to mining association 
rules [AIS93,AS94,BMUT97,FPSSU96,SON95,Toi96] or sequential pat- 
terns [AS95,SA96]. However, elements of handled association transactions have 
structures in the form of a labeled hirearchical objects, and a main difference is 
introduced with partially ordered references. In other word we have to take into 
account complex structures while in the association rules or sequential patterns 
elements are atomics, i.e. flat sets or lists of items. 



TransJd 


path expressions 


ti 


person : {identity, {name: _L, address; _L}} 


t 2 


person : {identity: {name: _L, address: < street: Jl, zipcode: _L >, 




company: director: < name: Jl, firstname: _L >}} 


ts 


person : {identity: {id: address: < street : _L, zipcode : _L >}} 


ti 


person : {identity: {name: _L, address: _L, company: _L}} 


is 


person : {identity: {name: _L, address; _L}} 


ie 


person : {identity: {name: _L, address: < street: Jl, zipcode: _L >, 




director: < name: firstname: _L >}} 



Fig. 2. A transaction database 



An Example: In order to illustrate the problem, let us consider the base D 
given in figure 2, reporting transactions about a population merely reduced to 
six. Let us assume that the minimum support value is 50%, thus to be consid- 
ered as frequent a path must be observed for at least 3 transactions. Let us now 
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person 




Fig. 3. An OEM graph 



consider the associated OEM graph given in figure 3. The only frequent paths, 
embedded in the DB are the following: identity: {name: _L, address: _L} and 
identity: {address: < street: _L, zipcode: _L >}. The fist one is discovered be- 
cause it matches with the first transaction ti while being detected for O and t^. 
In the same way identity: {address: < street: T, zipcode: T >} is supported 
by transactions 0, ^3 and t^. For instance the multiple path expression identity: 
{name: T, address: < street: Jl, zipcode: T >, director: < name: Jl, firstname: 
T >} is supported by transaction O and te but it is not frequent since the num- 
ber of data sequences supporting this path does not verify the mininum support 
constraint. 

3 Principles 

For presenting our approach, we adopt the chronological viewpoint of data 
processing: from collected raw data to exhibited knowledge. We consider 
that the mechanism for discovering regularities on semistructured data in a 
large database is a 2-phase process. The starting point of the former phase is 
semistructured objects extracted from sources on the Web and collected in a 
large file. 

From such a file, the mapping phase performs a transformation of original data. 
It results in a new populated database containing the meaningful remaining 
data. From such a database, data mining technique is applied in order to extract 
useful regularities on graph structures. The architecture of our approach is 
depicted in figure 4. 
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repository 



Fig. 4. General Architecture 



3.1 Data Extraction 

The structure of each object embedded in the source must be extracting during 
this phase. First of all, a data filtering step is performed in order to filter out 
irrelevant semistructured data such as image, video, sound, etc. Furthermore, ac- 
cording to the end user view point not interesting substructures are also prunned 
out. 

In order to perform such an extraction, we assume that the end user is provided 
with a parser or a wrapper. Of course, this parser requires the implementation 
of efficient algorithms for extracting graph structure. During our experiments, 
we assumed that such extraction has been done. For instance, in [HGMC"''97], a 
very efficient tool for extracting semistructured data from a set of HTML pages 
and for converting the extracted information into an OEM graph is addressed. 
This extraction is done on differents sources in order to provide a large collection 
of graph structure where we want to discover substructures appearing in some 
number of graph structures. 

3.2 Knowledge Discovery 

From the data yielded by the extraction phase, data mining technique is applied 
for fully meeting the analyst needs. 

We split the problem of mining structural association of semistructured data in 
a large database into the following sub-phases: mapping and data mining. 

Mapping phase: The transaction database is sorted with transaction id 
as a major key and values embedded in a set-of are sorted according to the 
lexicographic order. In order to efficiently find structural regularity among path 
expressions, each path expression is mapped in the following way. If the object 
value is a part of a set-of value, then we merge an ’S’ to the label (resp. a ’L’ for 
a list- of value). Furthermore, in order to take into account the level of the label 
into the transaction, we append to the label an integer standing for the level 
of the label in the nested structure. When two labels occur at the same level 
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and if the first one directly follows the second one, they are grouped together in 
the same set otherwise they form a new set of labels. The ordering of the list-of 
value is taken into account by creating a new set of labels. The “composite” 
transaction which results from the union of such sets obtained from original 
transaction describe a sequence of label and the ordering of such a sequence may 
be seen as the way of navigating through the path expression. 

This step converts the original original database into a database D of data- 
sequences where each data-sequence describes the ordering of labels into the 
transaction according to the path expression. 

Figure 5 describes the database of our previous example after the mapping phase. 



TransJd 


path expressions mapped 


ti 


(Sidentityi) {Sname2 Saddress2) 


t2 


{Sidentityi) {Sname2 Saddress2) {L street 3) {Lzipcodea) 




{Scompany2 Sdirector2) (Lnames) [Lfirstnamez) 


ts 


{Sidentityi) {Sid2 Saddress2) {Lstreets) {Lzipcodez) 


t 4 


{Sidentityi) {Sname2 Saddress2 Scompany2) 


t 5 


{Sidentityi) {Sname2 Saddress2) 


te 


{Sidentityi) {Sname2 Saddress2) {Lstreetz) {Lzipcodez) 




{Sdirector2) {Lnamez) {Lfirstnamez) 



Fig. 5. A transaction database mapped 



Mining Phase: From the database obtained from the mapping phase, 
the problem resembles closely to mining association rules (also known as 
market-basket proble) which was initially introduced in [AIS93] where associa- 
tion could be seen as relationships between facts, embedded in the database. 
Nevertheless our problem is quite different since we have to take into account 
hierarchical structure. In fact, our approach is very similar to the problem of 
sequential pattern which is introduced to capture typical behaviour over time, 
i.e. behaviours sufficiently repeated by individuals to be relevant for the decision 
maker [AS95]. In this context, we assume that we are given a database D of 
customers’ transactions, each of which having the following characteristics: 
sequence-id or customer-id, transaction-time and the items involved in the 
transaction. Such a database is called a base of data sequences (C.f. Fig. 2). For 
aiding efficiently decision making, the aim is discarding non typical behaviours 
according to user’s viewpoint. Performing such a task requires providing data 
sub-sequence s in the DB with a support value (supp{s)) giving its number of 
actual occurrences in the DB. In order to decide whether a sequence is frequent 
or not, a minimum support value {a) is specified by user, and the sequence s is 
said frequent if the condition supp{s) > a holds. 

The interested reader could refer to [AS95,SA96,MCP98] in which approaches 
for exhibiting sequences are presented and compared. 
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Our approach for mining structural regularities fully resumes the fundamen- 
tal principles of sequential pattern problem. In order to improve the efficicieny 
of retrievals we use an algorithm, called PSP, that we defined for mining 
sequential patterns. It resumes the principles of GSP [SA96] but it makes use 
of a different intermediary data structure which is proved to be more efficient 
than in GSP. Due to lack of space we do not detail the algorithm but interested 
reader may refer to [MGP98,MPG99]. 

As illustration, when running our algorithm on the database of 
the figure 5 with a support value of 50%, we obtain the follow- 
ing frequent paths: < (Sidentityi) {Sname 2 Saddress2) > and 

< (Sidentityi) {Sname 2 ) {Lstreets) {Lzipcode^) >. These results may 
thus be transformed according to information obtained from the mapping phase 
and we are thus provided with the following semistructured graphs: identity, 
{name: _L, address: _L} and identity: {address: < street: _L, zipcode: _L >}. 

The data mining algorithm is implemented using Gnu G-l— I- and preliminary 
results show that the approach is efficient. In the following figure, we report 
experiments conducted on the Internet Movies Database [WL99]. We got infor- 
mations about actors (500 differents actors were examined) . When applying our 
approach, we obtain the following results (G.f. figure 6). For instance, we can 
notice that more than 250 actors have a name, a date of birth and a filmography 
as well as a notable apparition on tv. 



Support 


frequent paths 


20% 


* {actor : {name, birthname, dateof birth, filmographyaS : {title, 
notabletv}}} 

* {actor : {name, dateof birth, dateof death, filmographyas : {title, 
notabletv}}} 

* {actor : {name, dateof birth, minibibliography, filmographyas : {title, 
notabletv}}} 

* {actor : {name, dateof birth, sometimescreditas : {name}, 
filmographyas : {title}}} 

* {actor : {name, dateof birth, trivia, filmographyas : {title, notabletv}}} 

* {actor : {name, sometimescreditas : {name}, filmographyas : {title, 
notabletv}}} 


30% 


* {actor : {name, birthname, dateof birth, filmographyas : {title}}} 

* {actor : {name, dateof birth, filmographyas : {title, notabletv}}} 

* {actor : {name, trivia, filmographyas : {title}}} 


40% 


* {actor : {name, dateof birth, filmographyas : {title, notabletv}}} 


50% 


* {actor : {name, dateof birth, filmographyas : {title, notabletv}}} 



Fig. 6. result of experiments on the Internet Movie Database 
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4 Related Work 

To the best of our knowledge there is few work on mining such a structural 
regularity in a large database. Nevertheless, our work is very related to the 
problem of mining structural association of semistructured data proposed 
in [WL98,WL99] where a very efficient approach for mining such regularities is 
provided. The author propose a very efficient approach and solutions based on 
a new representation of the search space. Furthermore they give some pruning 
strategies in order to improve the candidate generation. Nevertheless our work 
has some important differences. Unlike their approach we are insterested in all 
structures embedded in the database while they are interested in mining tree 
expression which are defined as a path from the root of the OEM graph to the 
atomic values. According to this definition of the tree expression they cannot 
find regularities such as identity, {address'. < street'. T, zipcode: T >}. In 
fact, when parsing the database in order to find frequent tree, they are only 
provided with maximal tree and when only a part of the tree is frequent it is 
not discovered. 



5 Conclusion 

In this paper we present an approach for mining regularities of semistructured 
objects in a large database. This approach is based on a data mining technique 
and preliminary results show that such a technique may be useful in order to 
discover schema regularities on the Web. We have defined the problem and 
proposed a very efficient approach to solve it. 
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Abstract. The paper discusses the potential of the usage of Extended 
Boolean operations for personalized information delivery on the Internet 
based on semantic vector representation models. The final goal is the 
design of an e-commerce portal tracking user’s clickstream activity and 
purchases history in order to offer them personalized information. The 
emphasis is put on the introduction of dynamic composite user profile 
constructed by means of extended Boolean operations. The basic binary 
Boolean operations such as OR, AND and NOT (AND-NOT) and their 
combinations have been introduced and implemented in variety of ways. 
An evaluation is presented based on the classic Latent Semantic Indexing 
method for information retrieval using a text corpus of religious and 
sacred texts. 



1 Introduction 

The pre-Internet era imperative stated that more data means better chance to 
find the information needed. Internet has imposed new standards and new way 
of thinking. In 1994 the World Wide Web Worm received an average of about 
1500 queries per day, in November 1997 only one of the top four commercial 
search engines finds itself (returns its own search page in response to its name in 
the top ten results) and nowadays the AltaVista search engine serves hundreds of 
millions queries per day. With the enormous growth of the information available 
on the Web the goal has changed and the main efforts are directed towards 
the limitation of the information presented to the user. [5] The first that felt 
the problem were of course the search engines and they offered the users several 
possibilities for advanced query refinements. Unfortunately their usage remained 
highly limited, since as Marchionini argued: ’’End users want to achieve their 
goals with a minimum of cognitive load and a maximum of enjoyment. . . . humans 
seek the path of least cognitive resistance and prefer recognition tasks to recall 
tasks; most people will trade time to minimize complexity”. [17] The problem 
of the relevance of information presented to the users was well understood by 
the commercial Internet sites. When people find some magazine irrelevant to 
their information expectations they simply stop to buy it. It is the same with 
the Web sites: if the information presented does not meet the customers’ needs 
they never return there. The limited volume of the magazines does not permit 
to include everything people would find relevant and they tend to specialize in 
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a particular area. People buy only those magazines that are relevant to their 
specific interests. The Web sites and portals are a different case because there 
are no so strong limitations of the amount of information to be published. The 
biggest Internet portals like those of Yahoo!, MSN or Netscape can offer almost 
all a customer may need. The problem is how to organize the site in order to 
help the users find what they are actually looking for. 

2 The Idea of the User Profile 

The most valuable decision is the development of a dynamic model of the in- 
terests for the specific user. The first attempt in the development of the profile 
was to ask the user to enter some words that best describe his or her interests. 
Another possibility is the selection of the relevant ones among a set of articles. 
Each of them can be assigned a list of key words that will be used to limit the 
information presented to the user. For example, a personalized search engine 
could return information only from the field of interest to the user. The same 
way a well-personalized Web site changes dynamically its content in order to 
present to the user only relevant information, news or advertisements, according 
to the previously created profile. [10] 

Asking explicitly the users for some kind of relevance feedback may not al- 
ways be the best way to create their profiles. This is especially the case when 
using key words. As Furnas, Landauer, Gomez and Dumais have shown in [11], 
people use the same words to describe the same subject in 10 — 20% of the 
time (see also [4]). The relevance feedback when using whole articles may not 
be correct too, because of the influence of some subjective factors like novelty, 
informativeness or familiarity to the user. 

Some sites/portals offer the customers the opportunity to receive a free ’’pass- 
port” . The users are asked to fill a form and answer a set of common questions 
that will be used as a primary source for the construction of their profiles. This 
may be of great importance and can lead to significant improvements. The prob- 
lem is that people tend to get annoyed when asked to do something in order to 
help the System. That is why several business sites/portals developed specialized 
mechanisms for automatic user’s profile construction. Some of the recent studies 
and applications in the field include the automatic tracking and recording of the 
user’s activity when browsing on the site: e.g. page visited, button clicked, hy- 
per link followed, search query entered etc. The information collected this way is 
called clickstream and is stored in specially designed clickstream data marts and 
Data Webhouses. Thus, the Web site/portal retains a full history of the user’s 
activity that permits the construction of a more effective and objective profile. 
[13,14,15,21] For almost all the cases the user profile has a dynamic character and 
changes over time since new information becomes available. The general sources 
of additional information are the raw details of the recorded user activity: the 
clickstream. Most of the systems use a vector representation of the user profile. 
This is very convenient and, as we will show later, simplifies its creation, support 
and usage. Although there are several different techniques for vector generation 
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of the type described, we have chosen the Latent Semantic Indexing for several 
reasons the primary of which is that it is a well-studied classic method that will 
allow us to concentrate on the specific details of the profile creation and usage 
we want to study. 

3 Latent Semantic Indexing 

The Latent Semantic Indexing (LSI) is a powerful statistical technique for a 
fully automatic indexing and retrieval of information. LSI is generally applied to 
texts and represents a two-stage process that consists of (see [7,9,16] for details): 
off-line construction of document index, and on-line respond to user queries. 
The off-line part of the process is the training part when LSI creates its index. 
First a large word-to-document matrix X is constructed where the cell (i,j) 
contains the frequencies of occurrence of the i-th word into the j-th document. 
After that, a singular value decomposition (SVD) is performed which results 
in the compression of the original space in a much smaller one where we have 
just a few number of significant factors (usually 50 — 400). Each document is 
then represented by a meaning vector of low dimensionality (e.g. 100). The on- 
line part of LSI receives the query (pseudo-document) user typed and finds its 
corresponding vector into the document space constructed by the off-line part 
using a standard LSI mechanism. Now we can measure the degree of similarity 
between the query and the indexed documents by simply calculating the cosine 
between their corresponding vectors. 



4 Extended Boolean Operations 

We return now to the automatic creation of the vector representation of the user’s 
profiles. Consider an e-commerce portal tracking users’ clickstream activity as 
have been discussed above. The information collected can be used in variety of 
ways including analysis of the quality of the Web site structure and organization, 
etc. ([13]) There are several things we are interested in when constructing the 
users’ profiles among which the most important are: 

— Which sections/pages on the site the customer visits most frequently? What 
do they content? 

— Which pages are ’’session killers” for our customer? 

— How long time does the customer spend on the site? 

— Who is our customer? How often he or she visits the site? 

— Has the customer purchased something and what, if any? What kind of 
products? 

— Is it a complaining customer that often returns back our products? 

Having collected information like this will allow us to create a sophisticated 
high quality user profile that will permit offering him or her personalized news, 
advertisements, banners etc. We would like to create a profile vector that is 
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closely aligned to the vectors of the pages the user is interested in and is far 
from those which seem to be uninteresting. A page of interest for the user may 
be a page where he or she goes often or spends a long time. The longer the 
user stays on the page, the more relevant it may be to his or her interests. On 
the other hand we must beware not taking too seriously the extremely long 
times (the user has just left the browser open) giving at the same time higher 
weights for the pages related to the user’s purchases, if any. So, we would like to 
combine the vectors of the relevant pages, weighted according to the frequency 
of the visits and the duration of the time spent there, in order to obtain the 
profile vector. This implies the need for a weighted OR similarity measure. We 
would like also to exclude the pages that seem to be strongly uninteresting for 
the user: e.g. those where he or she (often) cancels the session or those, we know 
he or she is not interested in, according to a relevance feedback, possibly taken 
from the user “passport” registration information supplied. This implies the need 
for excluding NOT (MINUS) Boolean operation. These examples show that the 
extended Boolean operations play a major role in the process of user’s profiles 
creation. 

Another possibility is to design a composite profile by keeping several differ- 
ent vectors whose weighted combination gives the profile vector. This results in 
improved performance since we can manage the different vectors the profile is 
built of separately and combine (some of) them only when needed. This allows 
the creation of a dynamic profile that may be recalculated when needed and 
with changed weights. For example, we may like to drop some elements of the 
user profile that are no longer relevant (because are old), or at least reduce their 
weights. 

Consider we have collected a complete history of the customers’ purchases 
and clickstream activity, and want to send the users several advertisements by e- 
mail or show them on the Web when browsing the site/portal. We have already 
developed LSI index based on the text description of each product. We can 
think of the purchases/clicks as query components and of the advertisement as 
a new document in the same space. We need some kind of similarity function 
that will give us a measure of the similarity between our advertisements and 
the user’s profile. Let us define di,d,2, ■■■,dn as distances (in LSI sense) between 
the ad and the n components of the query. The classic LSI algorithm calculates 
the cosines between the vectors in order to find the degree of their similarity. 
Most of the similarity measures for the Boolean operations we propose below are 
based on Euclidean distances, although we can use some other distances (angle, 
Manhattan, Chebishov’s, power, etc.). 

There are several similarity measures we have experimented with: 

OR-similarity measure. This measure depends only on the minimal dis- 
tance between the document and the query components and has the following 
general representation: Sor = /(wm(g(di), 3(^2), ff(dn))), where /(x) and 

g{x) are some one-argument functions. In case we have more information for 
the query we can add weights to the query components and change g{x) to 
g{x,w). So the formula is: Sor = f{'min{g{di,wi), g{d2,W2), ■■■, g{dn,Wn)))- OR 
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Fig. 1. OR similarity for two- and three-component query 



similarity measure has well separated picks at the query components vectors. 
The similarity measure for two- and three-component query, /(x) = 1/(1 -I- x), 
g(x) = X are shown on figure 1. 

AND-similarity measure. This measure depends only on the sum of dis- 
tances between the document and query components. It has the following general 
representation: Sor = f{{g{di,wi)+g{d 2 ,W 2 ) + ---+g{dmWn))) Usually this mea- 
sure can be thought of as a superposition of distinct similarity measures of the 
query components. The similarity measure for two- and three-component query, 
/(x) = 1/(1 -I- x), g{x) = X are shown on figure 2. 




Fig. 2. AND similarity for two- and three-component query 



Combination of the previous two (AND-OR). This similarity measure 
is a combination between the previous two: Sand-or = f{Sand, Sor)- We can use 
linear combination between Sor and Sand measures. S = k.Sor + (1 ~ k).Sand, 
where k is constant and 0 <= k <= 1. Figure 3 shows the two- and three- 
component query results for k = 0.5. We still have two distinct parts like the 
OR-similarity function but higher values in the middle region between them just 
like the AND-similarity function. 

MINUS and Binary NOT (AND-NOT)-similarity measure. In case 
we want to exclude a vector we can apply two different similarity measures: 
MINUS and NOT. For the MINUS similarity measure, if the vector considered 
is more similar to the exclude vector it will receive a similarity measure of 0 (see 
the second clause below). Otherwise we return a similarity measure that takes 
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Fig. 3. Combined similarity for two- and three-component query, fc=0.5 



in account the distance to the include vector only. We can use the following 
MINUS-measure: Snot = di, when di < ^ 2 , and Snot = 0, else. The result is 
shown on the left of figure 4. 

The problem with this measure is that it takes in account ^2 only when 
deciding whether to cut the value. A more sophisticated implementation may 
be used: the NOT (AND-NOT) similarity measure. If the document is more 
similar to the exclude document text it will receive a similarity measure of 0, 
but otherwise we return a similarity measure between 0 and 1 that takes in 
account the distances to both documents. Example: We can use the following 
NOT-measure: Snot = 1 — di/(l -I- ^ 2 ), when di < ^ 2 , and Snot = 0, else. The 
result is shown on the right of figure 4. 




Fig. 4. MINUS and NOT (AND-NOT) similarity measures 



5 Application to Religious and Sacred Texts 

The first step toward the construction of the dynamic user profile is the devel- 
opment of the appropriate extended Boolean operations. We have experimented 
the performance of the Extended Boolean Operations presented above on a large 
number of different corpuses containing thousands of documents by thousands 
of words and hundreds of megabytes. We will demonstrate how the functions we 
introduced above work on a small corpus of religious and sacred texts we found 
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at: http : //davidwiley . com/religion .html. We selected 196 different religious 
and sacred texts from 14 categories: apocrypha (acts, apocalypses, gospels, writ- 
ings), Buddhism, Confucianism, Dead Sea scripts. The Egyptian Book of Dead, 
Sun Tzu: The Art of War, Zoroastrianism, The Bible (Old and New Testaments), 
The Quran and The Book of Mormons. The experiments were made in a 30 di- 
mensional space with a preliminary to SVD replacement of the frequencies in X 
(196 documents and 11451 words) with their logarithms. Figure 5 illustrates the 
inter-document similarities given by the correlation matrix (196x196), shown in 5 
different colors for the five correlation intervals: black (87,5 — 100%), dark gray 
(75 — 87, 5%), gray (62, 5 — 75%), light gray (50 — 62, 5%) and white (0 — 50%). 
The dark rectangles in the main diagonal show the high correlation between 
texts belonging to the same religion. 




Fig. 5. Correlation between religious texts (196 x 196) 



We developed several specialized software command line tools supporting 
both the on-line and off-line LSI stages using the standard SVDPACKC library 
routines for the singular value decomposition [4]. A LSI based natural language 
query search engine has been developed based on these tools and exposed on the 
Web at http://nlp.rila.bg. 

Below are presented eight different tables that contain experimental re- 
sults obtained for two example texts from the corpus belonging to different 
well-separated clusters (religions): the first chapter of the Sun Tzu’s Art of 
War (suntzul.txt) and the first chapter of the Confucianism religious texts 
(conf 1 .txt). The first table contains the ranked top list of the documents sim- 
ilar to the first chapter of the Sun Tzu’s Art of War with the corresponding 
degree of similarity. Then follow seven tables containing the results from the ap- 
plication of different Boolean operations. Consider the user’s clickstream activity 
shows he or she is interested in information common to both the documents. The 
system needs to perform a Boolean AND operation on the LSI vectors of those 
documents and to produce a ranked document list in order to choose the relevant 



196 



Preslav Nakov 



documents. The results are shown in the second table. The following two tables 
contain the results of the application of two different excluding operations: NOT 
and MINUS, whose behavior has been discussed above. Then follow four tables 
showing the results of the application of four different types of OR operations 
for different values of k (see above). 





6 Discussion 

The results presented above show that the Boolean operations proposed perform 
well and can be used successfully in the meaning vectors construction by using 
any kind of Boolean expressions. As have been mentioned above, the correct ap- 
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plication of the Boolean operations is a key point in the development of the dy- 
namic user profile. The operations can be useful also in the construction of a nat- 
ural language query system giving the users the opportunity to combine any kind 
of natural language queries. After the ranked list has been returned the user can 
provide the system a relevance feedback by pointing out some of the documents 
as relevant or non-relevant to his or her query. The system will then provide a sec- 
ond ranked list of documents by combining the vector of the user query with the 
vectors of those documents using the appropriate extended Boolean operations, 
like this is done at http : / /Isi . research .telcordia. com/lsi-bin/lsiQuery. 

7 Conclusion 

We think the application of dynamic vector-based user profiles by means of the 
extended Boolean operations presented above is very promising. We continue 
our work by experimenting with different kinds of extended Boolean similarity 
functions and their behavior on different kind of corpuses. A research has been 
started whose goal is the application of methods for meaning vector creation, 
different from LSI, because the latter cannot be easily scaled to extremely large 
quantity of documents. The next stage is the design of a clickstream activity 
capture and a sophisticated analyzer of the user behavior in order to move further 
towards the creation of the personalized Web site. 
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Abstract. Most tools for supporting language learners concentrate, at 
best, on the syntactic correctness of the user’s productions. There are 
a number of reasons for this: (i) it’s hard enough to develop programs 
that can perform syntactic analysis, and even harder to develop ones 
that can construct meaning representations. But for a CALL tool, we 
need to construct meaning representations and then see whether they 
are correct, (ii) if we do have the computational tools for comparing the 
user’s production with some ‘correct’ output, we still need some way of 
obtaining such target analyses for comparison. In situations where the 
user is allowed to produce open-ended text, it is very hard to see where 
such target analyses will come from. 

The current paper will show how to make sure that what the user says 
makes sense at all, and then how to compare it with a target utterance. 
The key to both these activities lies in extensive use of an inference 
engine which is capable of producing models that give a picture of what 
someone who utters a given sentence has in mind. 



1 The Task 

We want to be able to support intermediate language learners by giving them 
hints about the content of what they have said. Clearly, language learners require 
help with grammar - they need to know that ‘I see the man who you were talking 
to’ is better than ‘Me sees the man what you were talking to’ - but intermediate 
learners also need help with content. They need help with general issues relating 
to the use of open class words (why do English people get ‘on’ buses and ‘in’ 
cars?) and with the semantic effects of tense and aspect markers (why can’t 
you say ‘I am knowing her’, why does ‘he is living in Buxton’ seem to report a 
temporary state of affairs?); and they need help with the meanings of specific 
terms - they need to be able to explore the consequences of saying ‘a hank 
is an institution that makes a profit for itself by lending out your money and 
also charges you for the priviledge ’, and to see if this is what someone from the 
financial services sector would say about banks. 

The current paper reports on an attempt to provide this kind of feedback by 
constructing a sparse representation of the content of what the learner has said, 
and then reasoning about what else they are likely to have in mind. The way 
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we construct meaning representations has been described elsewhere [10,11]. It is 
worth noting here, however, that these representations are constructed strictly 
compositionally: in other words, the semantic analysis depends critically on a 
prior syntactic analysis. The grammar we use is entirely constraint based, and 
permits relaxation of arbitrary constraints, including ones relating to word order, 
completeness (the requirement that everything which is expected is present) and 
coherence (the requirement that everything that is present is required) [13,12]. 
With such a grammar, it is comparatively easy to find and diagnose syntactic 
errors in what the user writes - we simply specify that certain constraints are 
‘soft’. Analyses that involve violation of such soft constraints will be considered if 
no error-free analyses can be found. Thus the system described here will serve the 
purpose of helping the user improve their understanding of the grammar of the 
language they are learning: this is not, however, the focus of the current paper - 
see [14,3] for various applications of this grammar for dealing with syntactically 
ill-formed texts. More importantly for the present paper, we are able to make a 
reasonable guess about what the learner was trying to write. This is crucial if we 
want to construct an interpretation of what they have written: if our meaning 
representations are to be obtained compositionally [15,7], then we have to have a 
syntactic analysis to obtain them from. The use of a constraint-based grammar 
with soft constraints thus serves two purposes for us. It makes it possible to 
provide the learner with feedback about the grammatical correctness of what 
they have written, and it enables us to assign a grammatical analysis and thence 
to obtain a meaning representation. 

2 Meanings and Interpretations 

In order to see whether what the user has typed makes sense and means what he 
wants it to mean, we have to construct a representation of its meaning and then 
reason about it. Our meaning representations are couched in anchored construc- 
tive A-calculus [17], and contain treatments of a range of semantic phenomena. 
The reasons why we choose this language, and the specific treatments we choose 
for the phenomena in question, need not concern us here. Any computational 
account of natural language will necessarily involve producing a meaning rep- 
resentation couched in some formal language, and giving some account of these 
phenomena. The techniques described below will apply to any such approach. 

To make this concrete. Fig. 1 shows the logical form that we obtain for 

(1) A hank lends money. 

This contains much of the information captured in (2), made more explicit (e.g. 
the fact that the bank is the agent of the lending event) and expressed in a 
formal language that a computer might be able to do something with. What we 
want to do with it is to see whether it (a) makes sense and (b) is accurate. 

To tell whether something makes sense, we have to have some notion of 
sensibleness. In other words, we need to have some notion of which ideas can 
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3A : {A is interval & endsjifter{ref{\B{speechdime{B,l))),A)} 
3C : {aspect {simple, A, C)} 

3DiE-. {memh{E,C)} 
bank{D) 

& 9{E, agent, D) 

& lend{E) 

E is event 

& ME : {0{E, object, F)}money{F) 



Fig. 1. Logical form for ‘A hank lends money’ 



go together, of what kinds of things can take parts in what activities. We could 
capture this by making statements like the following: 

MPl: MAMB : {9{A, agent, B)}animate{B) Agents are animate 

MP2: MA^{idea{A) & animate{A)) Ideas are not 

Then if somebody said 

(2) An idea slept. 

we would realise that this was not possible, since MPl says that the agent of the 
sleeping event has to be animate, but MP2 says that ideas are inanimate. 

That’s not bad, but if we simply tell our learner that what they have said 
makes no sense then they won’t learn very much. They need to know why it 
makes no sense. Suppose, for instance, they had said 

(3) Colourless green ideas sleep furiously. 

We would presumably want to tell them that this sentence contains more flaws 
than (2), since ideas can’t be coloured, green things can’t be colourless, and 
sleeping can’t be done furiously. 

If we want to make much progress with this kind of reasoning about what the 
learner has said, we need to have access to considerable amounts of background 
information, and we need to be able to deploy that information. Our background 
information is expressed as a set of ‘meaning postulates’ [4] - statements about 
the relationships between concepts. These meaning postulates are not sets of 
necessary and sufficient conditions, nor is there is any simple set of primitive 
concepts to which all others can be reduced. They are, rather, a set of mutual 
constraints on concepts. We cannot define most common terms, but we can 
explain how they connect to one another, and we can discuss what someone 
who says one thing must also be committed to (so if a learner says A bank lends 
money to a customer’ they should also be committed to A hank has money over 
which it has control’; and if they say ‘It is raining’ they should also agree that 
‘Drops of rain are falling from the sky’. [5] talks of ‘semantic traits’ - general 
properties that you would expect an item to have if a particular word is used 
for describing it). 
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We currently have about 200 such meaning postulates, ranging from very 
general statements about sets, linear orders, and similar abstract notions (e.g. 
MP3) to quite precise observations about specific terms (MP4, which says that 
if you fire someone then you must have been their employer, and that the firing 
action terminates the employment relationship). 

MP3: VAVB : {B > A}VC : {A > C}B > C 
MP4: VA : {fire2{A)} 

VS: {9{A, object, B)} 

VC: {6{A, agent, C)} 

3D : {employ{D)} 

9{D, object, B) & 9{D, agent, C) 

& termination{A, D) 

Obtaining meaning postulates of this kind is extremely labour-intensive, and 
even if you have the resources it is very difficult to come up with a consistent 
framework to work in ([8] report on the difficulties that can arise when you try 
to do this on a large scale). Nonetheless, you cannot expect to tell a learner much 
about the content of what they have said unless you have such a set of meaning 
postulates. If you don’t know how the meanings of words are related, how can 
you possibly tell whether what the learner has said makes sense and conveys his 
intended meaning? 

We will assume, therefore, that our 200 or so meaning postulates could be 
elaborated to provide reasonable coverage of some specific set of concepts in a 
restricted domain. If we have such a set, what should we do with it? 

We clearly need an inference engine to extract the consequences of what the 
user has said. An inference engine, in the most general interpretation of the term, 
is a program which extracts implicit information from a body of explicitly stated 
facts. There are a range of such programs, from simple database query engines to 
theorem provers for complex logics. The simpler (= less expressive) the language 
in which the facts are stated, the easier it is to write an appropriate inference 
engine. Natural language is an exceptionally expressive means of representing 
facts and rules, and as a consequence any formal language which is used for 
representing the meanings of natural language utterances will also have to be 
extremely expressive. We choose to use a constructive version of [16] ’s ‘property 
theory’. ^ 

When reasoning with MPs like MP2, we view not(P) as a shorthand for P 
==> absurd (this is the standard treatment of negation in constructive logic). 
With this, we can show that it is absurd to utter (2) if you also believe MPl and 
MP2. We usually use this to choose between alternate readings of ambiguous 
sentences, since if someone says something which has a number of readings then 
we can easily rule out any which are clearly impossible (see [18] for a similar 
approach to disambiguation). 

^ There are other languages of similar expressive power, e.g. the non-well-founded 
set theory underlying situation semantics [2,1]. All such languages share a common 
property: it is very difficult to provide inference engines for them. 
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Simply showing that what the user has written makes no sense is not, how- 
ever, all that much use as a support for learners. We want to be able to show 
them why it makes no sense, if this is indeed the case; and we want to be able to 
show them the discrepancies between what they have said and what we expected 
them to say. 

These are slightly different issues. Our learner may have said something which 
contradicts general commonsense rules - things like the claim that only physical 
objects can be coloured, or that mental objects cannot be animate. These rules 
can be hard-coded in our meaning postulates, as in MP2. They may, on the 
other hand, have said something which is not self-evidently wrong, but which is 
nonetheless at odds with what they were expected to say. This will be particularly 
likely when they are trying to use new vocabulary items - if, for instance, they 
have been asked to describe some entity that they are only just learning about. 

In either case, we need to make the ramifications of what they have said 
explicit. We need to see what they have said, and what follows from it, and 
compare this with what we expected. This is a rather non-standard way of using 
an inference engine. Instead asking the inference engine whether some specific 
proposition X is provable, we want a picture of everything that someone who 
uttered that proposition might have had in mind. We want a ‘model’ which 
makes what the learner has said plus everything else we know true. If we have 
such a model, we can compare it with a target model, produced for instance on 
the basis of our own definition of the term in question, to see what discrepancies 
there are between the two: what is in the user’s model but not in the target, 
what is in the target but not in the user’s model? 

We produce models by setting our theorem prover the task of proving that 
what the learner has said is inconsistent with everything else that is embodied 
in the meaning postulates and the prior discourse. We do not, in fact, want to 
succeed in this task. If what the learner has said is provably inconsistent then all 
we can do is report that it makes no sense. If the attempted proof of inconsistency 
fails, however, then we can use it to construct a model. The theorem prover we 
use is a version of [9]’s model generation theorem prover, extended to cope with 
the intensional operators of property theory [6]. Model generation proceeds, as 
its name suggests, by enumerating partial models. For standard theorem proving, 
the aim is to show that the negation of the goal has no models: the mechanism 
can easily be adapted, however, to show that the proposition under consideration 
does have a model, and to show what that model is like. As an example. Fig. 2 
shows the model we obtain after processing (1), where the word ‘fired’ has one 
possible reading where it denotes the act of terminating someone’s contract and 
another related to the discharge of firearms. 

(4) John fired his secretary. 

The model in Fig. 2 contains a number of entities, and specifies what they 
are like and how they are related, (including, for instance, the fact that the 
system was previously unaware that John had a secretary, and hence had to 
‘accommodate’ this fact). If we show this model to the user, they can see some 
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/ire(#121) 


ends_a/ter(#3(l), #3(1)) 


6»(#121,o6iect,#122) 


instantaneous 


6»(#121, agent, #92) 


start(#4(#120),#120) 


#rel(#121) 


start(#4(#121),#121) 


secretarj/(#122) 


start(#4(#3(l)),#3(l)) 


o/(#122,#92) 


end(#5(#120),#120) 


nnsea:ed(#122) 


end(#5(#121),#121) 


n(#122) 


end(#5(#3(l)),#3(l)) 


/irearm(#122) 


#5(#3(1)) > #5(#120) 


current jliscourse-stateil) 


aspect {simple, #120, #121) 


speech Jtime{4h‘i{l) , 1) 
endsJbefore{ih3{l), #120) 


accommodated(secretarg(#122), o/(#122, #92)) 



Fig. 2. Model for (1) 



of the consequences of what they have said^; and if we compare it with another 
one, generated on the basis of our own understanding of what has happened, 
then we can show them some of the differences between what they should have 
said and what they did say. 

3 Gross Errors 

For gross errors, we can rely on general knowledge to show that what the learner 
has said is problematic. MP2', for instance, says that it would be strange to 
think of an idea as being animate. 

MP2': VA : {idea{A) & animate{A)}weird{A) Ideas are normally inanimate 

If we use MPI and MP2' as the basis for reasoning about (2), we end up with 
the model in Fig. 3. 

This is useful (though opaque: as noted above, we need to provide a better, 
probably graphical, presentation of the model to the learner if it is to actually 
be of use to them). It shows the user what the world would be like if what 
they had said were true, and it marks the idea #1533 as being odd. It doesn’t, 
however, explain why it is odd. To get that we need to supplement our meaning 
postulate with a reason, replacing MP2' by 

MP2": VA : {idea{A) & animate{A)}weird{A) 

because idea{A) & animate{A) 
Remembering why things are weird 

(and while we’re about it, we’ll add 

^ or at least they could if we provided them with a reasonable presentation of the 
model, for instance by using a graphical interface. Such an interface is currently 
under development 
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ij/pe(#1531, interval) 
sharedloc{#1531, agent) 
mem6(#1531,#1532) 
sharedagentloc{^1531) 
niew(#1531, #1504(#1531)) 
criterial{#1531, sZeep(#1533)) 
static{i^l531) 
sZeep(#1531) 

6(#1531, apent, #1533) ti/pe(#1531, 

animate{#1533) 

idea(#1533) 

Zj/pe(#1533> abstract) 



ends_fee/ore(#1428(l), #1531) 
start(#1429(#1531), #1531) 
at(#1429(#1531), sZeep(#1533)) 
type(#1429(#1531), instant) 
#1429(#1428(1)) > #1430(#1531) 
end(#1430(#1531), #1531) 
at(#1430(#1531), sZeep(#1533)) 
compZeZion(#1430(#1531), #1531) 
type(#1430(#1531), instant) 
type(#1504(#1531), concrete) 
aspect {simple, #1531, #1532) 
uieird(#1533) 



Fig. 3. Initial model for idea slept. ’ 



MP5: VA : {green(A) & *s concrete)}weird{A) 

because green{A) 

& Zs concrete) 

Green things must be physical objects) 

Then if we try to get an interpretation of 
(5) green idea slept. 
we get the model in Fig. 4. 



tgpe(#1534, interval) 
sharedloc{ij^l33A, agent) 
memZ)(#1534, #1535) 
sharedagentloc{if^l33A) 
mew(#1534, #1504(#1534)) 
crZteriaZ(#1534, sZeep(#1536)) 
staZic(#1534) 
sZeep(#1534) 

6»(#1534, agent, #1536) 
tgpe(#1534, event) 
Zdea(#1536) 
animate)# 1536) 
green(#1536, Ad.(idea(T))) 
tgpe(#1536, abstract) 



ends_Zie/ore(#1428(l), #1534) 
start(#1429(#1534), #1534) 
at(#1429(#1534), sZeep(#1536)) 
tgpe(#1429(#1534), instant) 
#1429(#1428(1)) > #1430(#1534) 
end(#1430(#1534), #1534) 
at(#1430(#1534), sZeep(#1536)) 
compZetion(#1430(#1534), #1534) 
Zj/pc(#1430(#1534), instant) 
tgpe(#1504(#1534), concrete) 
aspect [simple, #1534, #1535) 
weird(#1536) because idea(#1536) 

& animate(#1536) 
weird(#1536) because green(#1536) 

& -'(concrete(#1536)) 



Fig. 4. Diagnostic model for green idea slept. ’ 
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Further elaboration along these lines will make it possible to show the user a 
picture corresponding to what they have said, and to pick out things that they 
have said that conflict with our general knowledge. 

4 Fine Errors 

We also, however, need to consider whether what the learner has said is what 
we wanted them to say. The use of meaning postulates that constrain the use of 
everyday terms will tell us about major misunderstandings, but when we want 
to know about our learner’s comprehension of technical material we have to be 
a bit more delicate. 

The approach we will take is to compare the output of what the learner says 
with the output of an expert. Suppose, for instance, that the learner has been 
asked to say what a bank does, and responds by typing 

(6) A bank lends money to its owner. 

It is not clear what we should say. It all depends on what we think the learner 
should know by now. Suppose that we think that they should know that a bank 
borrows money from some of its customers and lends it to others. What we 
should probably do is to show them a picture of the borrowing and lending 
activities that a bank takes part in. Suppose we had, as the teacher, input the 
following sentences: 

(7) A bank lends money to some customers. It borrows money from others. 

A reasonable response to the user’s input might be to highlight the differences 
between what follows from (4) and (4). To do that, we have to compute those 
differences and then decide which ones matter. Fig. 5 shows the models that we 
obtain for these sentences. 

These two models contain different entities, introduced by terms such as ‘its 
owner’ and ‘some customers’. We need to find the mapping between entities 
which minimises the differences between the two models, which we can do as 
shown in Fig. 6 (where the lending events #114 and #104 have been matched, 
and the agent #119 of #114 from the first model has consequently been identified 
with the agent #102 of #104). 

And then we need to work out what to do with this mapping - should we tell 
the learner about all the discrepancies between what he has said and what we 
wanted him to say, or just about the things that he has said that we were not 
expecting, or just the ones that he did not say that we were expecting? 

5 Conclusions 

The techniques outlined above make it possible to investigate the content of 
what a learner has written, as well as checking their grammar. We can look for 
gross errors, as in Section 3, by checking whether the user has said anything 
which leads to an apparent contradiction; and we can check whether we said 
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A bank lends money to its owner. 


A bank lends money to some 
customers. It borrows money from 
others. 


Zend(#114) 


customer {#102) 


0(#114,to,#119) 


card{#102,pl) 


0(#114, agent, #115) 


Zend(#104) 


feanfc(#115) 


0(#lO4,Zo,#lO2) 


n(#115) 


6(#104, agent, #105) 


oumer(#119) 


bank{#105) 


o/(#119,#115) 


n(#105) 


/(#59(#115)) 


/(#59(#105)) 


card(#59(#115), 1) 


card{#59{#105), 1) 


accommodated{owner {^119), of {#119, #115)) 


6(#111, agent, #105) 
borrow{#lll) 
from{#lll,#112) 
customer {#112) 
card{#112, pi) 



Fig. 5. Two models for comparison 



In user model but not in target 
o/(#102, #105), owner(#102), 
accommodated{owner {^102), 
o/(#102,#105)) 



model: 



In target model but not in user model: 
0(#111, agent, #105), type(#lll, event), 
/rom(#lll,#112), 

6orroui(#lll), customer {^112), 
customer (#102)] 



Fig. 6. The differences between one model and another (edited) 



what they expected, as in Section 4, by comparing the model that results from 
interpreting what they said with the model that results from what the teacher 
said. 

The methods in Section 3 require us to specify in advance the kinds of error 
we anticipate. This does not have to be done on a case by case basis. We simply 
replace statements to the effect that something cannot be the case by ones 
that say that it should not be the case, so that we replaced a statement that 
claimed that ideas are not animate by one that said that there is something odd 
about animate ideas. Statements about what is possible are part of our everyday 
knowledge, and should be included in our knowledge base anyway. All we have 
done is to replace them by statements about what is likely, or reasonable, rather 
than about what is possible. The effects of such facts about the world will emerge 
when appropriate. 

The methods in Section 4 depend on having a model answer, to be compared 
with the student answer. This is not unreasonable - if a student is trying to 
acquire a specific set of terms, the teacher is likely to want to provide quite 
precise tests, and is likely to know what is expected. The task of finding the 
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best match between two sets of propositions, of the kind shown in Fig. 5, is 
NP-complete. We have some heuristics for pruning the search, but it does take 
some time to obtain Fig. 6 from Fig. 5. 

These techniques, then, provide information about whether the learner has 
said something sensible, and about whether they have said what we expected. 
Exactly how such information would be used in an integrated CALL system, and 
how it should be presented to the user, is an open question. It seems plausible 
at first sight that some more graphical form of presentation might be suitable, 
but such presentations do not in fact seem to make it all that much easier to 
navigate large bodies of information. Both these issues are currently under in- 
vestigation. The present paper simply reports on the tasks involved in extracting 
this information about the content of what the learner has said. 
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Abstract. This paper presents the design and currently elaborated com- 
ponents in the knowledge-based learning environment called STyLE. It 
supports learning of English terminology in the domain of finances with 
a target user group of non-native English speakers. ^ The components 
elaborated so far allow for the discussion of the Web-based learning en- 
vironment, the approach to the building of a learner model, and the 
adaptive strategies for instructional and content planning depending on 
the learning situation. The paper emphasises on the specific aspects of 
learning terminology in a second language and checking the correctness 
of learner’s performance within the application of STyLE. 



1 Introduction 

Designing tools for learning terminology in a second language is a task deserving 
special attention. Learning specific vocabulary in a foreign language requires the 
development of natural language learning environments where learners should 
be allowed to explore the co-relations between their language capabilities and 
domain knowledge. Such environments have to provide domain knowledge to be 
used as a source for diagnosing student’s conceptual knowledge and for instruc- 
tional planning. All these entails that a foreign language terminology learning 
environment should adopt advanced language analysis methods that focus not 
only on the form but also on the meaning of the student’s input. 

Terms constitute a relatively stable and clearly determined kernel of lexical 
units in any Language for Special Purposes (LSP). It is well-known that many ba- 
sic terms have stable meaning without ambiguity in the considered domain, with 

^ STyLE (Scientific Terminology Learning Environment) is under development in the 
Copernicus’98 JRP LARFLAST (LeARning Foreign LAnguage Scientific Terminol- 
ogy), November 1999 - October 2001. Partners: CBLU, Leeds, UK; UMIST, Manch- 
ester, UK; LIRMM, Montpellier, France; Academy of Sciences, Romania; Simferopol 
University, Ukraine; Sofia University and Virtech Ltd., Bulgaria. 
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established typical collocations of usage in that LSP. This linguistic ostabilityo 
implies the numerous attempts for acquisition of formal models either as sophis- 
ticated terminological lexicons and term banks (see e.g. [1]) or as ontologies of 
domain knowledge ([2]). Terms and relations between them fix (in a natural way) 
the choice and granularity of the formal concepts, so the knowledge-based sys- 
tem offers to its user lexical and conceptual units which correspond to the user’s 
intuitive fragmentation of the domain. Projects like [2] show that it is possible 
to build CALL-oriented domain ontologies in complex domains (law); the ’’core” 
ontology encodes the educational content communicated to the learner in order 
to support understanding and obtaining insight in solving cases in administra- 
tive law. Thus, the task of terminology learning exploits a relatively structured 
(although extremely difficult to acquire) conceptual model. Once acquired, the 
ontology — in addition to the educational content — might be exploited in two es- 
sential ways: (i) the correctness of the student’s answer can be evaluated within 
the Knowledge Base (KB) and (ii) the planning of moves in the system-user 
interaction can be guided by this information. 

Learning terminology in a foreign language is a stream in second language 
learning. CALL-applications for other (foreign) LSP address many potential 
users, by default adults with some professional demands [3]. Hence, sophisti- 
cated adaptive systems with learner modelling and proper diagnostics are highly 
desirable achievements. Language learning presupposes that students type free 
Natural Language (NL) statements since it is unnatural to acquire a new lan- 
guage by only selecting menu options. CALL systems that provide such learning 
environments require Natural Language Processing (NLP) techniques to check 
the correctness of learner’s utterances. Due to the very complicated nature of the 
task, however, it is difficult to find successful examples of intelligent CALL pro- 
totypes for second language in general and for terminology in particular. In the 
state-of-the-art collection of NLP in CALL papers [4] , a general conclusion that 
”so few of these systems have passed the concept demonstration phase” has been 
made. As application of NLP techniques, the systems described in [4] contain 
mostly modules for checking students’ competence in vocabulary, morphology, 
and correct syntax usage (parsers); the most sophisticated semantic analysis is 
embedded in the system BRIDGE/MILD [5], [6], [7] which matches the learner’s 
utterance (a lexical conceptual structure) against the prestored expected lexical 
conceptual structures. This matching is implemented by an algorithm defining 
the intuitive notion of a correct match; the simple examples for semantic cor- 
rectness in [7] show that testing semantics is far beyond the foreseen progress 
expected in near future. To conclude, it seems clear that every project for lan- 
guage learning (including ours) needs to be restricted by a balanced choice of 
what the system gives to and expects from the learner, what is the main focus, 
which AI techniques are available to provide reaction in real time etc. 

This paper presents results obtained so far in a project where the main 
focus is improving the understanding/writing competence of the learner (adult, 
non-native English speaker) in the domain of finances. The implementation, 
the Web-based tool STyLE, follows some principles and ideas presented in [8]. 
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Section 2 sketches the project as a whole. Section 3 presents in more detail 
the diagnostic module, the learner model, the design of drills’ annotation, the 
mechanism for checking the correctness of the learner’s utterances in free NL, 
and the pedagogical agent planning local reactions in certain learning situations. 
Section 4 contains an example. Section 5 concludes the paper by discussing 
further work. 

2 LARFLAST Project Paradigm 

The project aims at the development of a Web-based learning environment where 
the student accomplishes three basic tasks (reading teaching materials, perform- 
ing test drills and discussing her own learner model with the system). The project 
is oriented to learners who need to improve their English language competence as 
well as their expertise in correct usage of English financial terms. Thus in general 
we attempt at finding some balance in the achievement of the goals: (i) to cover 
enough domain knowledge and relevant English terms; (ii) to test students’ lan- 
guage and conceptual knowledge, and (Hi) to find easy ways of student-system 
communication and discussion of learner misconceptions by diagrammatic rep- 
resentations, which are considered a powerful expressive language (the chosen 
technique is Open Learner Model (OLM), see e.g. [9]). This ambitiously formu- 
lated knowledge-based paradigm implies the necessity: 

— to support an intuitive conceptual representation (providing simple graphical 
visualisation of domain knowledge and learner model facts to the learner), 
~ to integrate formal techniques for NL understanding, allowing for analysis 
of the users’ answers to drills where the student is given the opportunity to 
type in free NL text (the system Parasite, developed at UMIST by Allan 
Ramsay — see e.g. [10] — is already integrated in STyLE). 

Knowledge Base. The central knowledge resource in Larflast is a manually 
acquired Knowledge Base (KB) of conceptual graphs. Domain knowledge in fi- 
nances is kept in four formats [11]: (i) graphical, used by the knowledge engineer 
during the knowledge-acquisition phase and by OLM for communication of di- 
agrammatic representations to the learner; (ii) first order logic, applied when 
important domain facts are translated as meaning postulates to be used for 
proving the correctness of learner’s utterances by Parasite; (Hi) CGIF, used for 
generation of Web-pages explaining the educational content of domain knowl- 
edge in immersive context; and (iv) Prolog representation used for further KB 
processing by generalisation, specialisation, natural join, projection etc. Formats 
(ii), (Hi) and (iv) are automatically generated by the primary representation (i). 
Specific problems and solutions relevant to acquisition of domain knowledge are 
considered in [12]. 

Current Implementation. Fig. 1 shows already elaborated components of 
STyLE, integrated under a Web-server, and internal software communications. 
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The system Parasite provides checking of the morphological, syntactic and se- 
mantic correctness of the learner’s utterances in especially designed drills, while 
the prover STyLE-Parasite checks the correctness against the available domain 
knowledge. OLM is developed by the Leeds team [9]. Other STyLE components, 
which provide learning materials with interface oriented to the student/teacher 
and which are developed by the other project partners, are not shown in Fig. 1. 



Submit user# and answer X 



WEB SERVER: 

(displaying drills and 
teachingmaterials, 
accepting submitted 
answers) 



PEDAGOGICAL 

RESOURCE 

BANK 




KNOWLEDGE 
BASE OF DOMAIN 
KNOWLEDGE 

" ^///////////////^ 



DATA 



( ) Components 

Dynamic or static files 



Fig. 1. Architecture of current STyLE components, discussed in this paper 



3 Pedagogical Resources and Their Maintenance by 
STyLE Components 

Having the main the responsibility for the integration of the STyLE components, 
in this paper we focus on the following issues: 

— elaboration of the pedagogical resource bank: (i) drills and their annota- 
tion: predefined drill goals, correct answers, etc.; (ii) pedagogical knowledge: 
weights of domain concepts/relations with respect to teaching, records for 
possible learners’ errors with close-semantic and close-language friends, etc. 

— design and maintenance of the learner model; 
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~ development of the prover STyLE-Parasite, which — in drills with free NL 
input from the learner — takes a logical form from the Parasite’s output and 
proves its domain correctness within the context of the KB facts, by matching 
the given answer against the expected answer (s); 

— design and development of a pedagogical module (pedagogical agent), which 
plans local reactions in certain learning situations. 

At present, the communication between the learner and STyLE is maintained 
by two main modules — Diagnostic Module (DM), which is responsible for the 
Learner Model (LM), and Pedagogical Agent (PA). DM assures analysis of the 
learner’s performance and fills in the LM; PA plans what is to be done next 
and refers, when necessary, to (i) OLM, where the learner discusses about her 
conceptual knowledge, and/or to (ii) the generator of dynamic web-pages, where 
the learner reads relevant texts with immersive context (this generator is not 
shown in Fig. 1). 

Currently STyLE offers test unit, covering about 80 basic English terms in 
finances. Each test unit consists of an explanatory text about important concepts 
and a set of drills of both types — with fixed choice answers and with free-text 
answers (see [13]). After the learner completes a drill with fixed choice answers, 
the results are submitted to DM where the response interpreter analyses the 
answer and computes the learner’s score. After the learner completes a drill with 
free text entry, the answer is submitted to Parasite for linguistic analysis and is 
passed to STyLE-Parasite for proving the domain correctness of the utterance. 
All information about learner’s performance is passed over to the PA, it plans 
what is to be done next depending on the learning situation and calls OLM if 
OLM situations have arose after the completion of the previous drill(as in Fig.l). 

3.1 Pedagogical Resource Bank of Drills and Pedagogical 
Knowledge 

Following some established practice in web-design of drills (see e.g. the Half- 
baked educational software [14]), STyLE user is presented with seven types of 
drills: Multiple choice, Gap fill, Crossword, Jumbled- sentence. Matching exercise. 
Ordering exercise (usually with fixed-choice answers) and Text-entry (with free 
text answers). 

An annotation (internal description) is associated with every drill. It 

— shows the way the answer is to be checked — how to match the given an- 
swer to a preliminary stored set of correct responses. One drill entry can be 
associated with more than one possible correct answers with different score; 

— contains information about how different drills and their items relate to 
domain concepts, encoded as KB items. A number — weight between 0 and 
10 — is associated with each concept and it shows the domain importance of 
this concept in respect to teaching; 

— contains explicitly stated goals of drills, showing which facts and relations 
concerning given concept are being tested. More then one goal can be associ- 
ated with one and the same drill if it tests different perspectives of relations. 
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Possible goals are: test definition of concept X, test relations between X and 
Y, test similarity between given concepts, test difference between given con- 
cepts. All goals have weighs, which is important for the planning of what is 
to be shown next. 

Fig. 4 shows fragments of drill annotation. The predicate test_aspect/6 
records in its 3rd-6th arguments correspondingly the correct answers (strings 
true/false in this case), the tested relation, the tested KB concept and the con- 
cept weight. 

3.2 Diagnostic Module 

DM checks the correctness of the learner’s answer, generates feedback with 
learner’s score to the learner, fills in the LM and organises data for further 
reflective dialogue with the learner in OLM. As shown in Fig. 1, there are two 
main modules analysing learner’s performance in drills: 



Response Interpreter for Drills with Fixed-Choice Answer. It matches 
the learner’s and expected answers and marks all the cases where they coincide 
and where they differ by asserting the fact that the user knows, respectively does 
not know the concept and attributes set in the goal of the drill’s item. The score 
is calculated according to the number of correctly answered items in the drill. 
While matching the answers, the interpreter analyses all the history in the LM 
facts and if a contradiction arises, it records an ” OLM-situation” . 



STyLE-Parasite Interpreter for Drills with Free- Text Answer. To pro- 
vide advanced NL understanding in cases when the learner is given the oppor- 
tunity to type in freely, Larflast integrates the system Parasite. 

Parasite either recognises the learner’s utterance as a correct one or returns 
information about linguistic inconsistency of learner’s utterances: morphologi- 
cal, syntax and semantic errors (in the later case no logical form can be com- 
puted). Answers with correct linguistic semantics are subjects to further con- 
siderations of their domain relevance, proved by STyLE-Parasite. At present 
STyLE-Parasite distinguishes the following cases of wrong conceptualisations: 
(i) over-generalisation, (ii) over-specification, (Hi) usage of concept the defi- 
nition instead of its name, (iv) predicates — i.e. domain facts — included in the 
answer expectation but missing in the student’s answer, (v) parts of the student’s 
response that lead to contradictions with the answer expectations. Relevant in- 
formation about all cases is asserted in the LM. 

Domain correctness is proved in several basic steps as follows: 

— Preparation of expected answers: preliminary generation and storage as files 
of the syntax trees and logical forms of all correct expected answers. Human 
experts choose the ’’essential minimum” in each answer; 
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— Preparatory analysis in run-time: application of Parasite to each learner’s 
answer for obtaining the syntax tree and logical form of the learner’s utter- 
ance; 



— Comparison of the inference sets by STyLE-Parasite: the two inference sets 
(the expected one, A, and the received one, B) are compared as sets of pred- 
icates (see Fig. 2). All predicates from B\A are recorded in a file additional. 
Then B is reduced to B' = B\additional. All predicates from A\B' are 
stored as file missing and A is reduced to A' = A\missing. STyLE-Parasite 
compares A! and B' . This procedure does not change the number of the 
occurrences of each of the remaining predicates (with different values as ar- 
guments) in each of the resulting sets. If the number of predicates in B' 
is greater than those in A', then after binding all of the predicates’ vari- 
ables in A', redundant predicates are removed from B' (i.e. those predicates 
from B' which have some unbound variables) and appended to the file ad- 
ditional. If the number of the predicates in A! is greater than the number 
of those in B' , the binding of all variables is impossible and this leads to 
contradiction. 

— Search within the space of possible bindings of the free variables in A! and B' . 
STyLE-Parasite applies heuristics for binding of the variables in A! and B' 
predicates: the predicates with more free variables and least binding candi- 
dates have priority for binding. Contradiction causes backtracking. 

There might be several kinds of mistakes in the received answer, so learner’s 
utterances are to be investigated with respect to all possible error types applying 
the above-described steps. STyLE-Parasite inference is complete, since it finds 
all existing ways to bind the variables. But it is not necessary to find all bindings, 
because the conclusion ’’correct learner utterances” is indicated after the first 
correct binding and the proving halts. 

3.3 Learner Model 

LM keeps track of learner’s performance during all sessions. After analysing 
user’s answer to each drill, DM asserts to the user’s LM information about her 
knowledge: e.g. 

knowdJserName , Concept, [List Relations], DrillName, Number). 



Expected answers 




Learner’s answer 



Fig. 2. Space of search in STyLE-Parasite 
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Currently four types of diagnostics about the learner’s knowledge are as- 
serted: know, not Jinow, self jiiot -know and know -wrongly . The Number argu- 
ment shows for which time the concept is tested. This allows us to keep track 
of the stability of user’s knowledge because we can detect cases of gaps and 
changing performance. 

3.4 Link to Open Learner Model 

Analysing learner’s answers, DM discovers situations where either the learner 
or the system need further dialog, providing elaboration of learner’s conceptual 
knowledge. This entails a link to the OLM component. The following situations 
are diagnosed at present: 

— contradiction — there are LM-facts know and not-know about the same con- 
cept. This mean that the user’s knowledge is not stable or that she does not 
know some of the more complicated attributes of the concept; 

— confuse close semantic concepts — LM shows that the learner confuses con- 
cepts marked as very closely semantically related (for example money market 
and financial market) . We remind that information about semantic closeness 
in teaching is explicitly encoded by the domain/teaching expert in the ped- 
agogical resource, to point domain concepts and relations usually confused 
by novices in the domain; 

— confuse close language concepts — LM shows that the learner confuses con- 
cepts that sound related, because of the words constituting the term. These 
types of confusion are typical for non-native speakers, who are mislead due 
to phonological or linguistic similarity [3] . 

While in the first situation a dialogue in OLM aims at solving the inconsis- 
tency in the learner’s knowledge, in the next two a further interaction learner- 
OLM articulates aspects of learner’s domain knowledge and assigns possible 
reasons for the learner’s errors. OLM situations are shown at Fig. 6. 

3.5 Pedagogical Agent 

The main role of PA at present is to plan future learner’s moves between lessons 
and drills. Since considerations concern presentational as well as educational 
issues, according to the terminology in [15] we would classify our planner as 
performing some aspects of instructional as well as content planning. There 
are two main movement strategies — local and global. The local strategy plans 
moves between drills, testing different characteristics of one and same concept. 
Its main goal is to create a complete view about learner’s knowledge about this 
concept. This strategy chooses drills with increasing complexity when the learner 
answers correctly and gives again previously completed drills if the student has 
performed poorly. The global strategy plans movements between drills, testing 
different concepts, according to their place in the ontology. PA chooses next 
learner’s movement depending on: (i) Predefined drill’s goals, (ii) KB items, 
(Hi) Concept weights and (iv) Learner’s Score. 
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If the score is under 50% of the maximal one, PA shows link to readings. 
If after the learner’s answer OLM situation arises, PA shows link to OLM for 
further discussions. PA always offers the default moves to correct answers and 
next drill or unit. 

4 Example 

Figure 3 shows a fixed-choice drill and the corresponding student’s answers in 
the STyLE environment. The internal drill identifier is ’drilL2’ 



3unit One Test Two - Microsoft Internet Explorer 


^Xl 


|j File Edit View Favorites Tools Help 


in 


|J >}-»Back ^ ^Search [^Favorites '^History ^ | 


|j Address http;//larflast.bas.bg/cgi-bin/gete.exe/brows?link=unit_one_second.html&id=galia 


▼ I ^Go |J Links **| 



Which of these statements, describing financial markets, are true, and 
which are false? 



1 . There is only one product traded on financial markets - financial claims. fdonTknowTT] 

2. The money market is a place wher e mainly in dividuals and institutions with long- 
term investment plans borrow funds. |fais6 

3. A securi ty or loan maturing within one year or less is a money market instrument. 

I false 3 

4. The construction of factories, high ways, schoo ls and homes relies mainly on the 

trading of funds on the capital market, [true 3 ^ 

m Done I I [^"internet ^ 



Fig. 3. Student performance in STyLE 



Learner’s answers are matched with the drill annotation shown in Figure 4. 
The learner ”student007” answers ”I don’t know” to the first item of the drill, 
which is recorded as a fact in the LM (see Figure 5). This data is used by PA 
for selection of suitable study materials. 



test_aspect (drill_2 , iteml , [false] , [object] , f inancial_market , 10) . 
test_aspect (drill_2 , item2 , [false] , [attribute], money _market , 5). 
test_aspect (drill_2 , items , [true] , [instrument], money _market , 7). 
test_aspect(drill_2,item4, [true] , [attribute], capital_market , 5). 



Fig. 4. Fragments of internal drill annotation 



Drill entries two and three are interesting because they test different type of 
information about the same concept, the learner knows one of them and does 
not know the other one. Those facts are put in the LM and OLM situation of 
the first type (contradiction) is registered. As a suggestion for next move PA 
generates to the learner’s web-interface a web-page containing, together with 
the default hyperlinks, a hyperlink to OLM and another hyperlink to readings, 
relevant to financial market topics. 
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self _not_know(student007, f inancial_market , [object], drill_2, 1) . 
know(student007 , money_market , [attribute], drill_2, 1). 
not_know(student007, money_market , [instrument], drill_2, 2). 
know(student007 , capital_market , [attribute], drill_2, 1). 

Fig. 5. Status of LM obtained after the work of Response interpreter 

5 Conclusion and Further Work 

This paper considers the present components developed within the ongoing 
Larflast project. Current results allow to evaluate important aspects of the final 
product: (i) we believe that it is possible to achieve a relatively simple but com- 
plete ontology of 100-200 terms in the financial domain; (ii) on-line integration 
of Parasite can be done in a Web-environment by attentive design of appropriate 
drills; (in) planning helps essentially in guiding the learner within a rich envi- 
ronment where the learner is offered many choices, including free Web surfing, 
and seems to be an obligatory control component. 

The future work includes integration of the whole system STyLE and relevant 
user study and evaluation. 
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Abstract. Allowing the student to have some control over the diagnosis 
inspecting and changing the model the system has made of him is a 
feasible approach in student modelling which tracks the dynamics of 
student behaviour and provides for reflective learning. We present an 
approach for maintaining the student model in interactive diagnosis 
where a computer and a student discuss about the student's knowledge. 
A belief modal operator is adapted to model the knowledge of the 
learner and to help in maintaining the interaction between the computer 
system and the learner. A mechanism for finding agreements and 
conflicts between system and learner’s views is described. 



1 Introduction 

Modelling a learner’s cognitive capacity is essential for an intelligent tutoring system 
to provide individualised instruction and adaptive interaction [1]. Allowing the 
student to have some control over the diagnosis and to inspect the model the system 
has made of him is a feasible approach in student modelling [4] which tracks the 
dynamics of student behaviour [2] and provides for reflective learning [3]. A similar 
method is applicable to user modelling [5] and building adaptive systems [6]. A 
constructive interaction guided by the system where both the computer and the learner 
reflect on the learner's beliefs is the means for involving the user in diagnosis. 
Designing such an interactive process needs a dialogue management framework [7] 
and a formal engine to maintain a student model which is jointly constructed by the 
system and the learner and accumulates their views about the learner's knowledge. 

There are few attempts to formalise the process of maintaining the user/learner 
model when open for inspection and change directly by the user/leamer, see [8,9]. In 
these projects, the notion of the interaction is very constrained and the formalisations 
they offer do not consider modelling the process of interactive reflection which 
results in a jointly constructed student model. Such a task is addressed in this paper. 
We present an approach to formalising the process of maintaining a jointly 
constructed learner model in interactive diagnosis. The kernel of this approach is a 
mechanism for finding agreed beliefs and conflicts between the computer system and 
the learner when discussing the learner’s knowledge. We have employed an epistemic 
operator belief in a dialogue game interaction model and have adapted formal 
specifications from [10] into an interactive diagnostic context. 

The conception of a jointly constructed student model can be related to the notion 
of common and distributed knowledge in multi-agent systems where sound and 
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complete axiomatisations have been provided [11]. However, there is a debatable 
rationale for adopting strong deductive approaches for inherent problems with 
computational complexity and natural plausibility in modelling human reasoning [12]. 
A basic assumption in interactive diagnosis is that the belief set the computer system 
has about the learner is not complete. Moreover, while the learner is expected to 
reflect on his knowledge, he may well apply unsound and incomplete reasoning, 
which the system will seek to correct. Therefore, to model interactive diagnostic 
situations we have been able to adopt some simplifications. 

Several belief models that employ nonmonotonic and limited reasoning have been 
developed to model agents' beliefs in dialogue simulations (c.f [12], [13], [14]). 
Mutual beliefs of the system and the user which these systems consider, particularly 
mutual beliefs about the user’s domain beliefs, are similar to the notion of agreed 
beliefs in interactive diagnostic dialogue. The agreements play a crucial role in 
interactive diagnosis presenting the jointly constructed student model. Hence, they 
have been elaborated in our formalisation, so that not only explicit agreements but 
also implicit and assumed ones have been modelled. In addition, we define conflicts 
between the system's and the learner's views about the learner's knowledge which are 
sources for a negotiative dialogue in interactive diagnosis. This notion of conflicts is 
different from conflicts between the system's and the user's beliefs which user 
modelling frameworks use to define the user's erroneous and incomplete knowledge 
[14]. In our model, the correctness of the learner's beliefs is assessed by comparing 
the agreements about the student's beliefs with the system's domain knowledge. 

Next in the paper, there is a discussion about the process of maintaining the student 
model in interactive diagnosis and the need for a mechanism for finding agreements 
and conflicts between the computer and the learner. Such a mechanism will be 
presented in section 3 and an example of its application will be elaborated in 
section 4. Finally, further applications of our approach will be discussed. 



2 Maintaining the Student Model in Interactive Diagnosis 

We use the term interactive diagnosis to emphasise the view of diagnosis as an 
interactive process involving two agents, a diagnoser and diagnosee, who discuss and 
construct together the student model. Open learner modelling environments where the 
learner is involved in a discussion reflecting upon the model the computer has built of 
him [4] present interactive diagnostic situations. Other cases are peer diagnosis 
systems [15] where users reflect on the models they have built about their peers. 

2.1 An Example 

The example below, which will be further elaborated in section 4, is from a session 
with STyLE-OLM' - an open learner modelling component in a terminology learning 



* STYLE-OLM is an open learner modelling component in the STYLE environment developed 
within the LARFLAST project: http://www.larflast.bas.bg/. The example here is from a 
version of STyLE-OLM in a Computer Science domain. 
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environment [16]. The interaetion in STYLE-OLM is grounded on a dialogue 
maintenanee framework [7], the eommunication language is based on eonceptual 
graphs [17] (for simplieity here we eonsider propositions stated in a natural language), 
the domain knowledge base is eneoded with coneeptual graphs [18]. 

Initially, the learner model includes learner’s beliefs „VISUAL BASIC is an OBJECT- 
ORIENTED LANGUAGE", „VISUAL C++ is an OBJECT-ORIENTED LANGUAGE", and 
„OBJECT-ORlENTED LANGUAGE contains OBJECTS". The dialogue strategy of C is to 
discover a possible misconception behind the learner’s erroneous belief that VISUAL 
BASIC is an OBJECT-ORIENTED LANGUAGE. 

(1) C. You think that VISUAL BASIC is an OBJECT-ORIENTED LANGUAGE. 

(2) S. I Agree. 

(3) C. You think that VISUAL BASIC is a visual language. 

(4) S. I Agree. 

(5) C. You think VISUAL LANGUAGES are object-oriented languages. 

(6) S. I disagree. 

(7) C. Do you think that visual C++ is an OBJECT-ORIENTED LANGUAGE. 

(8) S. Yes. 

(9) C. You think that VISUAL BASIC contains OBJECTS. 

(10) S. I agree. 

(1 1) C. Do you think that VISUAL BASIC has a characteristic INHERITANCE? 

(12) S. I doubt this. 

(13) C. Do you know a relation between object-oriented language and inheritance? 

(14) S. I don’t know. 

(15) C. You think that VISUAL BASIC is an OBJECT-ORIENTED LANGUAGE BECAUSE it has OBJECTS but 

object-oriented LANGUAGES also have a characteristic INHERITANCE which VISUAL BASIC does not 

have. 

The aim of the learner model maintenanee is to find out what is to be ineluded in 
the learner model after the interaetion. 



2.2 Maintaining the Stndent Model in Interactive Diagnosis - 
Main Components 

The learner model maintenance process has three main parts (fig.l): 

□ Ascribing participants beliefs from speech acts; 

□ Inferring what has been agreed through the interaction; 

□ Ascribing a level of correctness to learner’s beliefs and updating the learner model. 




Fig. 1. Maintaining the student model in interactive diagnosis 
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We have shown elsewhere that a dialogue management framework based on 
dialogue games [19] can be adapted for maintaining communication in interactive 
diagnosis [7]. Following a dialogue game model, commitment rules define the effects 
of moves upon the dialogue participants' commitment stores. These rules ascribe 
changes in participants’ belief stores after a speech act is uttered. The ascribed beliefs 
are not added directly to the student model but accumulated in temporarily built 
commitment stores to keep different viewpoints, which is essential in providing the 
notion of a collaborative dialogue [3]. To maintain the dynamics of the belief stores, a 
belief revision approach similar to the one described in [8] is employed. 

Being responsible for maintaining the student model which is to be used later by 
other components of the learning environment, an interactive diagnostic module has 
to have a mechanism for finding what has been agreed through the interaction which 
is the kernel of a jointly constructed student model. After an interaction episode 
finishes, i.e. the participants agree to change the focus of the discussion, an inference 
mechanism is to be adopted to refine the beliefs of participants and to find out what 
these agents have agreed about the learner's beliefs. The agreed beliefs are to be 
attributed a level of correctness by comparison with the expert's knowledge and then 
used as a source for updating the learner model. The inference mechanism also has to 
detect which are the conflict points in the participants’ belief stores. In the following 
interactions, these points are to be used as the negotia in a negotiative dialogue game. 
A possible formalisation of finding agreements and conflicts and maintaining a jointly 
constructed learner model is described in the rest of the paper. 



3 Finding Agreements and Conflicts about the Student’s Beliefs 

3.1 Assumptions 

Our starting point of view is that learner’s domain knowledge is modelled in terms of 
beliefs that correspond to the conceptual level of the student model. We consider a 

structure of the learner model X: where S is a set of learner's 

beliefs and consists of correct, erroneous and incomplete beliefs, <71”“ is a set of 
misunderstanding rules that represent patterns of potential conversational failures and 

is a set of misconception rules that define possible reasons for learner’s 

erroneous and incomplete beliefs. We consider ,71”“ and ,71”“ useful for the diagnoser 

to plan the interaction and open for a discussion. Hereafter, we use two agents a 
student s and a computer c. 

We will use an epistemic operator to denote that the student believes a fact 
represented by the propositional formula (p. We also use the negation to denote 

that the student does not believe (p (he may not believe — i^which is 

For the computer, expressions Bf(f) and present its domain expertise and 

are derived from the system knowledge base. We also consider nested beliefs to 
represent the beliefs that the computer has about the student. Thus BfBftf)) denotes 
that c believes that s believes cp. Its negations are: BfBf-^cpf) c believes that s believes 
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— 1 ^, c believes it is not the case that ^ believes -nBc(B^((p)) c does not 

believe that s believes cp. Such expressions build the student model in the traditional 
diagnosis where the computer has the overall control over the diagnostic process. The 
computer opens these beliefs for a discussion with the learner in interactive diagnosis. 
Such expressions appear in the computer commitment store throughout the dialogue. 

Note that learner’s beliefs about system’s beliefs have been considered redundant. 
We assume that in the conversation the learner reflects on his beliefs or challenges 
those from the system’s commitment store, e.g. when the learner agrees with 
the corresponding commitment rule will add to the learner belief store 
and when he challenges BJi^B^{(^), -nB^{(^ will be ascribed. 

Before the interaction, we will consider a base set of computer beliefs /„ a base set 
of computer beliefs about student beliefs and a base set of student beliefs 4. The 
beliefs in I^. are encoded by a domain expert and those in and 4 are either stated 
explicitly or assigned by some initial domain specific inference. 



3.2 Reasoning 

We consider our agents to have a (not necessarily complete) set of inference rules that 
we will call reasoners. We will denote the student reasoners over his beliefs by 31^ 
and the computer reasoners over its beliefs by 31^. The computer will also have 
reasoners about the beliefs of the student - 

The rule (z!i,...,(4l“^will allow us to infer new beliefs from agents' belief sets. For 
example, Bf(f\),..., Bf(ff)\- Bf(f) will allow us to assume that for the computer 
Bf(f) is true \f Bf(f>i),..., Bf(ff) are true. The computer might do some inference over 
its beliefs about the student. For instance, the rule 

BfBlcp^), BfB,{ y^) l-«es BfB,(y))) 

defines some default reasoning assumptions made by the computer about the learner's 
reasoning. is the least determined set presenting learner's reasoning over his 
beliefs. An example of a possible rule from 31^ is given below. 

Blq)^, BJc^y/) 1-^ 

We will infer new agents’ beliefs by applying their reasoners only once. As 
discussed in [10], there is a certain rationale in adopting limitations. Humans tend not 
to draw all possible conclusions from their beliefs. Also, it might be considered 
peculiar to assume that an agent believes a proposition which needs a fairly long 
inference process upon the agent’s beliefs in order to be ascribed as true. 

We can now define agent's belief sets that will be the essence for deriving agents' 
agreements and conflicts. 

Sc={^| ^/cOr3 ^^4 i = !...«} 

Ss = {^| ^/sOr3 i = 

^CS { ^1 ^ f CS ttr 3 (^,...,^1 /cs, i 1 . . . 77 } 




226 Vania Dimitrova et al. 



3.3 Agreements 

A distinguishing characteristic of an interactive diagnostic dialogue is that agents are 
talking about the beliefs of one of them. This entails a proper adjustment of the 
mechanism for searching for what has been agreed throughout the interaction. 
Following a commonsense view that agents agree with something if they have the 
same opinion about it, we identify the agreements in interactive diagnosis as being 
those beliefs of the student that the computer assumes the student believes and the 
student himself accepts. 

We will consider three groups of agreements. Explicit agreements ^explicit can be 
found by simply matching the beliefs in the initial belief stores. 

^explicit {^1 tj)^ 4 ^ G 4s} 

In most of the cases, there are few explicit agreements. To define implicit 
agreements ^implicit we will consider agents' reasoners and will match the beliefs in 
and Scs. 

^implicit {^1 G ^cs} 

People tend to agree when they do not contradict one another. Following an 
autoepistemic notion, we define assumed agreements'. 

^assumed = {^1 ((^^^s) & docs not contradict ^cs)) 
or {{Bftpj G Scs) & {<t>does not contradict Sis))}- 

We will consider that a belief formula (j) contradicts a set of belief formulas <Pif the 
negation of ^belongs to <5 

3.4 Conflicts 

Conflicts in interactive diagnosis are sources for negotiation and in most of the cases 
help in articulating learner’s domain beliefs. In accordance to the definitions about 
agreements, we will define conflicts in computer and student’s views about student's 
beliefs. 

Explicit conflicts Cexpikit will be obtained by matching the beliefs in agents’ initial 
belief sets. 

Cexpiicit = {^1 ((^e 4)& {Bf^ contradicts fs)) or 
{{Bftpj G 7cs) & {(p contradicts If}} 

Likewise, we will define implicit conflicts Cimpiicit by considering all beliefs derived 
after agents’ reasoners are applied. An ensuing definition of assumed conflicts would 
appear redundant because of its similarity with CimpHcit. 

Cimpiicit = {^1 ((^e Sis) & {Bff) contradicts Sics)) 
or ({Bc{(p) G Sics)) & {^contradicts Sis))} 
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3.5 Jointly Constructed Student Model 

The jointly constructed student model (we consider only the belief part open for a 
discussion) consists of all agreements between the computer and the student. 

^ ^explicit^^implicit^'^assumed 



3.6 Assigning Degree of Correctness to the Learner's Beliefs 

In the previous sections we described how S - a jointly constructed learner model - 
will be obtained. The learner's beliefs are assigned categories of correctness. This can 

be done by comparing the beliefs in with the system domain beliefs Sc. We 
consider the following categories: 

^is a correct belief iff g S) & {<pe Sc). 

^is an erroneous belief iff (Ss(^) g S) & ((— Sc) or {(pi Sc)). 

^is an incomplete belief iff {(pe Sc) & 

G S) or (— iSs(^) G S) or (Ss(^) i S)) 

In this section, adopting an epistemic operator belief and employing agents that do 
not have a deductive reasoning, we have been able to provide a mechanism for 
finding agreements and conflicts between a computer and a student when discuss 
about the student's knowledge. This allows maintaining a jointly constructed learner 
model. The next section illustrates the use of the mechanism in an interactive 
diagnostic situation. 



4 The Example Analysed 



We will now examine the example in section 2 by applying the mechanism described 
in section 3. The example has been elaborated extensively to show more aspects of 
the approach. To make the analysis simpler we will denote: 

Pi = VISUAL BASIC is an OBJECT-ORIENTED LANGUAGE. 

P2 = VISUAL C-H- is an OBJECT-ORIENTED LANGUAGE. 

P2 = OBJECT-ORIENTED LANGUAGE contains OBJECTS. 

P4 = VISUAL LANGUAGES are OBJECT-ORIENTED LANGUAGES. 

Pi = VISUAL BASIC is a VISUAL LANGUAGE. 

Pl, = VISUAL BASIC contains OBJECTS. 

Pi = OBJECT-ORIENTED LANGUAGE has a characteristic INHERITANCE. 

Pi = VISUAL BASIC has a characteristic inheritance. 

In STYLE-OLM, computer reasoners 31^ are based on conceptual graphs rules of 
inference [17]. Some rules from <^cs used in the example are 

BfBl(f)^i{ii)j), BfBlqf) l-«esi BfBlifij)-, 

BfBi{(p)^,{ij)j), BfBiiyii) \-Rci2 BfBiig))). 

f are encoded by a domain expert in a conceptual graphs knowledge base. In the 
example, before the dialogue 
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/es={i?c(fis(Pl)), B,{B,{p2)), fic(^sfe))}. 

Consulting its knowledge base, STYLE-OLM will discover that fic(^s(Pi)) is an 
erroneous belief - an individual has been wrongly assigned to a class. Then the 
system will search for possible learner’s misconceptions to explain this 
misclassification. There are two potential candidates. 

Misclassification_l . The individual has common features with an individual that 
belongs to the class, i.e. visual basic is an object-oriented language because it 
is a VISUAL LANGUAGE like VISUAL C++, which is an OBJECT-ORIENTED LANGUAGE. 

Here, the system will start with p 2 and by generalising it to will assume that 
BJ(BJp> 2 )=B^(p^). Then, it will make a hypothesis that B^,{B^(p^) by applying the rule 

^csl- 

Applying restriction over p^, the system will infer p^ and will assume that 
BJ^BJph^&.Bf,{p^=BJp>i)). Then, it will make a hypothesis that BJ^BJp^) by applying 
the default rule i?cs 2 - 

Misclassification_2 . The individual has features that are part of the class features, 
i.e. VISUAL BASIC is an OBJECT-ORIENTED LANGUAGE because it contains OBJECTS. 

Applying restriction over /? 3 , the system will infer p(, and will assume that 
BJ(BJp2)&.BJp>(2)=B!ip\)). Then, it will make a hypothesis that BJl^BJpf)) by applying 
the default rule i?cs 2 - 

Thus, before the interaction the belief sets will be: 



B,(p2), B,(p3)} 

Bcs={B^(B^(pi)), BJfis(p2)), BJfis(p2,)), B^{BJp 4 )), B^{BJpi)), B,(B,(pe))} 

During the interaction more beliefs will be added as follows. 

(4) will add B,(ps)el,. 

(6) will bring -nBs(p 4 )els and a challenge to Bc(Bs(p2))^3ics that will imply the 
explicit question in (7). 

(8) will show that Bc(Bs(p 2 )) will remain in but Bc(Bs(p 4 )) will be deleted. 
Misclassification_l will be withdrawn. 

(10) will bring B,(pf,)el,. 

(11) assuming -nBc(Bs(p^))eIc, the computer asks about this explicitly. 

(12) will show that -nBs(ps)eIs. 

(13) following the answer in (12), the computer will assume Bc(-^s(Pi))^Ic 
will aim at checking it. 

(14) will confirm -nBs(p 2 )eIs and misclassification_2. 

(15) will inform about the discovered misclassification. 

Therefore, after the interaction 



Ss= 4 = [B^ipi), B^(P2), BJp^), B^ips), B^ipf,), -^^ipi), ^s(Ps)} 

7es={7?c(fis(Pl)), B,iB,{p2)\ B,iB,{p2)\ fic(-Ss(P7)), -^ciB^iPi))} 
Scs={-Sc(^s(Pi)), BJfilp2)), BJfilp^)), BJ^B,(pi)), BJ^B,(P(,)), BJ^--BIp 2)), -BJfiip^))} 
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-^explicit {■Ss(Pi) B^ip2) Bsipi), -^sipi)} 

-^implicit” {■Ss(Pi) BJj) 2) B.ipi), B,(ps), B,(pe), -^siPi)} 

-^assumed^ {“^s(P4), -^s(Ps)} 

^^{Bsipi) Bs(p2) Bsipi), BJj}s), Bs(p(,), ~Bs(Pa), ~Bs(Pi)} 

In addition, during the interaction a misclassification of type 2 was discovered as 
an explanation oiB^ipi). 

In this example, explicit conflicts have not been discovered. An implicit conflict 
about Bfpf) was discovered and consequently overcome by deleting BfBfpfj) 

from Scs. 

Before being added to the learner model, the beliefs from 3i will be assigned a 
level of correctness comparing them with the knowledge in the system domain model. 
For example, pi will be considered as an erroneous belief, p^ will be assigned as 
incomplete and p(, as a correct belief 

The example has illustrated how the mechanism described in section 3 maintains a 
jointly constructed learner model in an interactive diagnostic situation. Before the 
interaction, the learner model includes very constrained beliefs about the learner. 
Throughout the interaction, using its reasoners, the computer system made some 
hypotheses about learner's beliefs and asked him for verification. At the end of the 
dialogue, finding system and student's agreements, a more elaborated learner model 
has been obtained and a learner's misconception has been discovered. 



5 Conclusion 

In this paper, we have presented an approach for maintaining a jointly constructed 
student model in interactive diagnosis. We have adapted a belief modal operator to 
model the knowledge of the learner and to maintain the interaction between the 
computer system and the learner. The nature of interactive diagnosis where agents' 
reasoning is not complete and not necessarily sound has allowed us to explore several 
simplifications and to avoid problems due to computational complexity. 

The applicability of the mechanism for maintaining a jointly constructed student 
model has been demonstrated in STYLE-OLM an interactive diagnosis component in 
a terminology learning environment. 

A substantial insight from formalisation in intelligent systems is that the models 
developed for one application can easily be employed in another context. The 
mechanism for maintaining a jointly constructed learner model described above has 
been adjusted to natural situations of human reasoning. This brings practical 
advantages of the approach making feasible its extension in peer diagnosis situations 
where two learners discuss the knowledge of one of them. In this context, the 
mechanism will provide a computer system with an engine to build models of the 
peers as well as to mediate the interaction between them. Another potential dimension 
for future investigations is a possible extension of the mechanism to modelling 
agreements and conflicts in collaborative dialogues. 
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Abstract. An approach for the usage of metaphors in learning terminol- 
ogy in a foreign language is presented. The approach integrates ideas 
from the metaphor theory of Lakoff and Johnson [6] with techniques 
from corpus-based computational linguistics, advanced markup lan- 
guages on the web (XML and XSL), and knowledge acquisition and 
processing. Metaphors are identified by semi-automatic methods and 
are annotated accordingly to domain and metaphor ontologies. The cor- 
pus with annotated metaphors, together with the domain and metaphor 
ontologies are used for dynamically generation of personalized web 
pages. 

Keywords: Metaphors, WWW, XML, Ontology, Knowledge-Based 
Systems, Corpus Linguistics. 



1 Introduction 

This paper presents a methodology and an associated knowledge-based framework 
developed on the web for supporting foreign students to learn non-literal, metaphori- 
cal phrases. One of the main ideas is to add a new dimension to the ontology (knowl- 
edge)-based systems available now on the web (e.g. [16]). This new dimension is 
provided by the experientialist theory on metaphors of Lakoff and Johnson [6] in 
which metaphors are considered to have an essential role in understanding, opposed to 
traditional ontology-based systems that use subcategorization: “subcategorization and 
metaphors are two endpoints of a continuum” [6]. 

The approach was experimented in a system which has been developed for the gen- 
eration of highly structured World Wide Web pages for learning finance terminology. 
This system is one of the modules of the Copernicus project LarFLaST [9], which has 
as main objectives to provide a set of tools, available on the web, for supporting Ro- 
manian, Bulgarian and Ukrainian students to learn foreign terminology in finance. A 



S. A. Cerri andD. Dochev (Eds.): AIMSA2000, LNAI 1904, pp. 232-241, 2000. 
© Springer-Verlag Berlin Heidelberg 




Metaphor Processing for Learning Terminology on the Web 233 



Study on learning foreign terminology in finance, performed in LarFLasST [15], has 
shown that for understanding new concepts in finance, collocations and metaphors 
play an important role. Metaphors are often used to give insight in what a concept 
means, like in the following example: “Stocks are very sensitive creatures” [11]. Such 
insight can not be obtained in knowledge-based approaches centered around taxo- 
nomic ontologies. For example, these systems will explain the concept “stock” in 
terms of its super-concepts like “securities”, “capital”, “assef’ or “possession”. Its 
attributes and relations with other concepts may provide more details. This paper 
describes an alternative approach, for the identification, annotation and usage of 
metaphors in corpora as a basis for explanations giving the above mentioned insight. 

The approach presented in the paper integrates metaphor processing with ideas 
from knowledge acquisition and processing (text mining techniques for metaphor 
identification, knowledge-based web page generation [14]), corpus-based computa- 
tional linguistics, and advanced markup languages on the web (XML [18] annotation 
of metaphors and visualization with XSL [18] ). 

The next section presents the basic ideas of ontologies and their role as a scaffold 
in intelligent programs for supporting learning. Section 3 introduces some ideas of 
Lakoff and Johnson’s theory on metaphor [6] and discusses the role of metaphors in 
language understanding. An ontology for metaphors is introduced in section 4. The 
way metaphors may be processed is analyzed in section 5. A final section is dedicated 
to some conclusions and to comparisons to other approaches on metaphor processing. 



2 Domain Ontology, a Scaffold of a Learning Process 

Learning is a knowledge centered activity: One of the main goals of a learning process 
is the articulation of a body of knowledge for the considered domain. The skeleton of 
this body is usually a semantic network of the main concepts involved in that domain. 
These concepts are taxonomically organized, have several attributes and relations 
connecting them with other concepts. Using other words, the learner must articulate in 
his mind the ontology of the domain (that he wants to conceptualize): 

"An ontology is a specification of a conceptualization. ...That is, an ontology is a 
description (like a formal specification of a program) of the concepts and relationships 
that can exist for an agent or a community of agents". [5]. 

One common way used in supporting learning by knowledge-based programs is to 
“build” the domain ontology in the mind of the student. The program is incrementally 
introducing to the student new concepts from the ontology and tests if he has correctly 
acquired them. From these tests, a model of what the student has acquired is kept in 
order to decide what concepts will be considered next. This process is also used in the 
LarFLasT project [9]. 

For example, the finance (domain) ontology contains the following fragment of 
concept taxonomy: 
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Security 

Bond 

Stock 

Common-Stock 

Prefferred-Stock 

Each concept has attributes. For example, a stock may have the following attrib- 
utes: 

□ issuing date, 

□ maturity date, 

□ dividends. 

Each concept may be related with other concepts. Related terms with share are: 

□ the stockholder, 

□ the issuer. 

Ontologies are now available on the web. Ontologies like WordNet [16], Eu- 
ro WordNet or MikroKosmos [10] provide a lot of useful relations between concepts. 
For example, WordNet offers hypemyms (super-concepts), hyponyms (sub-concepts) 
and other related concepts. 

WordNet answers to the queries about “stock” are filtered in LarFLasT by the 
teacher for getting the right sense and in the web page generation phase (see figure 1) 
the result (which includes important parts of the WordNet answer) will look like in the 
following example: 

stock - is the capital raised by a corporation through the issue of shares entitling 
holders to partial ownership; "he owns a controlling share of the company's stock" 
is a/an capital, working capital - is assets available for use in the production of 
further assets 

is a/an asset - is anything of material value or usefulness 
is a/an possession - is anything owned or possessed 

stock may be of the following types: 

common stock, common shares, ordinary shares - is stock other than preferred 
stock 

no-par-value stock, no-par stock - is a stock with no par value specified kin the 
corporate charter or on the stock certificate 

preferred stock, preferred shares, preference shares - is a stock whose holders are 
guaranteed priority in the payment of dividends 
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3 The Role of Metaphors in Language Understanding 

The information provided by the domain or WordNet ontologies (as depicted in the 
example that ended the last section) is very useful for understanding the “stock” con- 
cept. However, for a deep understanding of what a stock means, classification knowl- 
edge is not enough, the experience of other people involved in operation with stocks 
being also very important. Metaphors are powerful means to cover this gap (e.g. 
"stocks are very sensitive creatures" [11]). This idea is enforced also by the results of 
a study in the LarFLaST project on the problems of learning foreign finance terminol- 
ogy: 

"... the difficulty of "cracking" metaphors. The language of Economics and Fi- 
nance is extremely metaphorical and sometimes clusters of metaphors result in rather 
elaborated images. Very often this is metaphorical elaboration of an everyday word, 
otherwise completely familiar." [15] 

Lakoff and Johnson have developed an influential theory on metaphors [6]. They 
consider that “subcategorization and metaphors are two endpoints of a continuum”, 
and that metaphors "... form coherent systems in terms of which we conceptualize our 
experience" [6]. One consequence is that metaphors offer other expressive means than 
traditional ontology-based systems that use subcategorization. This is related with the 
idea that the process of understanding something implies an emphatic relation [17], 
which involves the immersion of the learner in a context: 

"emphaty is a phenomenon in which one person can experience states, thoughts and 
actions of another person, by psychological transposition of the self in an objective 
human behavior model, allowing the understanding of the way the other interprets the 
world " [7] 

The side effect of trying to understand a metaphor is an emphatic process, a kind of 
immersion: “The essence of metaphor is understanding and experiencing one kind of 
thing in terms of another” [6]. For example, the metaphor: "stocks are very sensitive 
creatures" [11] is giving us very valuable insights in the behavior and characteristics 
of stocks, we even could understand them by comparing to ourselves, as very sensitive 
creatures. All the above considerations explain also some other mistakes done by 
foreigners learning finance terminology in English, in recognizing typical collocations 
which have often some metaphorical background (e.g., "to sustain a loss"): 

"An inability to recognize and co-ordinate typical noun-verb collocations within a 
given context. Frequently a term is used with a specific verb, which is not a term itself 
These collocations are very typical for economics texts in general, e.g. to sustain a 
loss, to bear the loss, liable for debts, to repay a loan when it is due, etc.". 
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4 A Metaphor Ontology 

Lakoff and Johnson refer to metaphors as a eorrespondenee between two kinds of 
things, for example: “ARGUMENT IS WAR”. The first term (“argument”) is the 
eoneept being metaphorieally deseribed (the target of the metaphor) and the seeond is 
the way it is eonsidered (“war”, the souree of the metaphor). 

The usage of metaphors has as side effeet the projeetion of a system of attributes, 
relations, seenes, seripts ete., from the seeond eoneept to the first one. For example, an 
argument implies usually two sides and follows seripts similar to wars. That implies 
that a whole system of words and expressions may be used similarly (e.g. “he attaeked 
her with a new faet”). 

In analyzing the metaphors eonsidered by Lakoff and Johnson, several elasses of 
eoneepts used as a seeond part (souree) in metaphors may be identified: resourees, 
instruments, physieal objeets, humans, aetions and proeesses. These may be further 
grouped aeeording to Lakoff and Johnson as Orientational, Struetural and Ontologieal. 

For the purpose of the system presented here, an ontology has been developed for 
the eoneepts used as souree of metaphors. These eoneepts are taxonomieally organ- 
ized, as in the following fragment: 

Physiealobjeet 

Organism 

Human 

Instrument 

Building 

Pillar 

Metaphors are used for a purpose, they refieet the writer’s intentions. We eould 
say, from a speeeh aet theory perspeetive, that they have strong illoeutionary foree. 
For example, saying that something is “a pillar of stability”, has an important effeet 
due to the faet that pillars are very important for buildings, and the idea of a reliable 
building is very important to everyone. Therefore, the reason for whieh a metaphor is 
used, the intentionality of the writer behind the metaphor is very important and may 
give very useful insights for a true understanding of the text where the metaphor has 
been used. 

For eapturing the reasons for whieh a metaphor is used, in our metaphor ontology, 
for eaeh eoneept (whieh ean be a souree of a metaphor) a set of attributes (transferred 
to the target eoneept) is defined. For eaeh attribute, a set of typieal values are in- 
eluded. For example, the organism eoneept is often used in metaphors for expressing a 
speeifie state (healthy - “a healthy eeonomy”, illness ete.), a sufferanee (e.g. pains - 
“the eompany experienee the growing pains”) or a propriety (e.g. sensitivity). All 
these attributes are speeified in the organism eoneept and may be eonsidered by the 
metaphor proeessing system. 

The important effeet of metaphors may be explained starting from the signifieanee 
of these attributes for our existenee, their resonanee with our experienee. We eould 
say that these attributes are refieeting the intentionality behind the usage of metaphors. 
Therefore, these attributes and values are used for explaining speeifie metaphors in a 
given eontext domain (see seetion 5.3). 
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5 Metaphor Processing 

This section presents a knowledge-based system for the identification, annotation and 
usage of metaphors in a corpus as a support for learning foreign terminology. The 
approach has been tested on a collection of documents retrieved from the New York 
Stock Exchange site [11]. The architecture of the system is presented in figure 1, re- 
flecting the processing done by various modules, the information flow and the interac- 
tions. The positions of the professor and of the student are chosen to ilustrate their 
access to the modules and information in the system. 
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Fig. 1. The architecture of the system 
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Metaphor processing involves three different activities: 

□ identification (acquisition) of new metaphors, 

□ annotation of the identified metaphors, 

□ usage of the metaphors. 

5.1 Metaphor Identification 

Metaphor identification may be done manually, with a text editor, or 
(semi)automatically, with the help of a concordancer. Corpora of texts may be used 
for the automatic identification of typical collocations. Some of the detected colloca- 
tions may be candidates for metaphors (e.g. “to sustain a loss”). Metaphor candidates 
for metaphors may be detected automatically starting from words used often for ori- 
entational metaphors (e.g., “high”, “in front of’ - “high spirif’, “high level of expec- 
tations”, ’’“the future is in front of us”, “a challenge is in front of us” etc.), or other 
metaphorical concepts from the metaphor ontology. For example, all the concordances 
found for “pillar” in a collection of finance texts from NYSE are metaphors [11]: 

ing what is really one of the pillars of our whole system in this country. The N 
as ever faced. You've been a pillar of stability and a great, great leader for 
or tool and a very important pillar of economic education. Going further, thou 
airman Levitt has been a true pillar of strength in bringing the international 

Metaphors may be identified also when some semantic constraints are not met after 
a semantic analysis, like in the example “Stocks are very sensitive creatures” (“stock” 
is not an animate so it does not meet the constraint imposed by “creature”). 



5.2 Metaphor Annotation 

Text annotation is often used in computational linguistics. The tagged corpora ob- 
tained after manual annotation are used for training tagging programs to be used on 
unannotated texts. In our approach, metaphors are annotated not for training programs 
but for knowledge-based programs in the aim of providing explanations and perspec- 
tives on the metaphors in the corpus. 

Metaphors are annotated in the texts in which they occur, according to corpora 
markup standards. For this, a specific XML (SGML) markup element was defined 
(“metaph”) which may be included in corpora DTD-s (Data Type Definition). The 
metaphor element has three attributes: 

• What - the concept that is metaphorically presented (e.g. “stock”). 

• How - the way the concept is presented; the value of this attribute may be one of 
the types of metaphorical concepts from the metaphor ontology (e.g. “organism”). 

• Why - the reason for which the metaphor was used. This reason may be one of 
the especially specified attributes of the metaphorical concept in the metaphor 
ontology. It reflects the intentionality of the writer in choosing to use the meta- 
phor. 
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An example of a metaphor markup used in LarFLasT is: 



<metaph what="stock" how=" organism" why="reactivity"> 
Stocks are very sensitive creatures</metaph> 



5.3 Metaphor Usage 

The metaphor annotated corpus may be used in several purposes. In our system, de- 
veloped for the Larflast project, the corpus is used for explanations in teaching pur- 
poses, providing insights for the concepts, which are learned. In this aim, XSL de- 
scriptions are used for presenting different (personalized) web pages, starting from the 
XML corpus encoding. An example of a fragment of a very simple explanation ob- 
tained from the corpus and a simple XSL file (referring to the attributes of the 
“metaph” markup) is given below: 

Stocks are seen as: 

object — > pieces of ownership in the corporation, called stock, 
object — > stockholders. 

organism — > Stocks are very sensitive creatures, 
organism — > They react to all kinds of influences, large and small, 
organism — > their sensitive reactions register as price changes, 
organism — > News events can trigger a change in stock prices when they affect 
the laws of supply and demand, 
resource — > a price for its stock. 



Other explanations, obtained with the same corpus but with a different XSL file are: 

Some reasons for using these metaphors are: 

• — Stocks are very sensitive creatures — is reflecting that the stock is a/an or- 
ganism. Reason — > better reflects the reactivity of stock. 

• — A company faced with growing pains — is reflecting that the company is 
a/an organism. Reason — > better reflects the sensitivity of company. 



The ontology of metaphors may be used in connection with the annotations in the 
corpus for making inferences about what to include in the explanations. For example, 
several metaphors may be compared according to the “what”, “how”, and “why” at- 
tributes and, considering also the metaphor ontology inferences may be performed. 

Metaphors may be used also in other purposes: linguistic, stylistic, artistic, philo- 
sophical or even psychological (metaphors used by a human may reflect his experi- 
ences; the best example are the artists) analysis of texts. 

The language used in the system was Perl and the object-oriented knowledge repre- 
sentation environment XRL [1], developed in Common Lisp. A translator from the 
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conceptual graphs interchange format CGIF [2] to XRL is under development. Trans- 
lation to XRL from XML-based ontology markup languages (OML [12], SHOE [13]) 
will also be considered. 



6 Conclusions and Comparison with Other Approaches 

Metaphors may be of a real help in teaching activities for providing insights that are 
fundamental in true understanding of concepts. In fact, every good teacher uses meta- 
phors in his lectures. Moreover, Lakoff [6] considers that they are fundamental for our 
way of thinking. 

Metaphors may be identified in texts semi(automatically), they maybe annotated 
using SGML (XML), and used in knowledge-based systems. An ontology for meta- 
phors may be developed. The feasibility of this approach has been tested in the frame 
of the LarFLaST project [9] for explaining metaphors for students learning finance 
terminology in English. 

A similar approach to our metaphor ontology has been proposed by James Martin. 
His MetaBank [8] includes three kinds of knowledge: about the source, about the 
target of the metaphor, and about the metaphor itself The third kind of knowledge 
comes from the analyses of the Berkley Metaphor Site [3], a web site containing ap- 
proximately 200 metaphors available textually on web pages indexed by name, source 
and target domains. Martin also proposed a similar idea of semi-automatical finding of 
metaphors [8]. 

Our approach differs from Martin’s approach in the usage of corpus annotated with 
metaphors and in the intentional annotations. Other difference is the main purpose of 
the usage of the metaphor processing. The systems of Martin [8] and Fass [4] try to 
find and process metaphors in English text. Our approach combines techniques for 
metaphor identification and representation in a knowledge-based way with annotation 
with intentional information. The main purpose of our system is to provide dynami- 
cally generated explanations about metaphors in an annotated corpus for foreign stu- 
dents learning English terminology. 
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Abstract. The paper discuses some problems of presentation and proc- 
essing of linguistic knowledge, needed for the development of real-size 
Bulgarian linguistic resource to be used in a multilingual text generation 
system, covering software manuals sublanguage. The sublanguage vol- 
ume is specified by corpus analysis. The text generation is based on the 
Systemic Functional Linguistics theory. A method for developing 
lexico-grammar resources by re-using an existing resource for another 
language is described and illustrated with three examples. 
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1 Introduction 

The paper deals with some problems of knowledge presentation and processing for the 
needs of automatic test generation. The discussion is focused on the presentation and 
processing of linguistic knowledge, needed for the development of real-size Bulgarian 
linguistic resource to be used in a multilingual text generation system, created under 
the international project AGILE [3, 5]. The project aim is to develop a generic set of 
tools and linguistic resources for generating CAD/CAM software instructional texts in 
Bulgarian, Czech and Russian. They are developed by an extensive use of the gram- 
mar development environment and multilingual sentence generator KPML (Komet- 
Penman MultiLingual system [2]), which theoretic base is the Systemic Functional 
Linguistics (SFL, [4]). 



2 Corpus Analysis and Sublanguage Specification 

Technical manuals within specific domains constitute a sublanguage [8]. An important 
property of a sublanguage is its lexical and syntactical closure. The lexical closure is 
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determined by the domain specifieity of the sublanguage, as well as by the norms of 
technical communication, which prefer monosemy to synonymy. The syntactic closure 
leads to application of small number of rigid syntactical structures. 

The sublanguage specification was made on the base of corpus analysis. Because of 
the sublanguage limitations a small corpus, containing nine procedural texts with 1025 
words and 194 coding units, was found to be sufficient for the needs of the project. 
The corpus processing was used to help in the determination of: 

• domain model concepts, i.e. the domain ontology 

• text planning processes 

• lexical resources 

• grammar resources. 

The paper will focus further on the determination of lexico-grammar resources. 

The corpus analysis, made in terms of the SFL conceptual base, leads to the fol- 
lowing conclusions about the sublanguage used in Bulgarian software manuals: 

• The great majority of the rank units are clauses and the rest are nominal groups. 
The prepositional groups do not occur in instructional texts of the corpus. 

• The processes are exclusively of the directed-material type. Sometimes relational, 
mental, and not-material processes are found in the corpus. 

• Finite and positive polarities predominate over non-finite and negative polarity 
features in this particular sublanguage. 

• Most of the analysed clauses are non-modal. In the case when modality is ex- 
pressed in the clause it is of the ability, inclination and obligation type. 

• The mood is usually realised by an imperative clause. 

• The voice is active, although a few instances of middle were counted. The passive 
voice was not found in instructional texts at all. 

• The user is the most frequent agent, the alternative is program objects appearing as 
agents. 

• Most of the text units are members of a complex clause. The hypotactic relation is 
realised mainly by a manner or purpose conjunction, although condition and tempo- 
ral conjunctions occur as well. Paratactic relation is realised by the additive and 
alternative conjunctions. 



3 Approach to Resource Development 

The adopted approach for resource development was to re-use an existing large-scale 
English grammar as a base for the Bulgarian resource due to the following reasons: 

1. lack of large-scale Bulgarian computational grammar aimed at automatic text 
generation; 

2. need for fast prototyping; 

3. re -usability of the new-build resource, i.e. natural tendency for extension beyond 
the immediate goal - the coverage of given sublanguage. 
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This approach has been shown to be effective in several previous developments 
([1], [2], [7]) and avoids the necessity of building large-scale grammar from scratch. 
In [3] the reuse of existing English grammar for typologically different (Slavic) lan- 
guages is studied across the SFL model. SFL is a functional theory of language, in 
which the concept of function is reflected in three metafunctions', ideational, con- 
cerning world representation; interpersonal, reflecting the role relations of speaker and 
hearer in a discourse; textual, representing the patterns for creation of cohesive and 
coherent text. The strata distinguished in SFL are lexico-grammar, semantics, and 
context. Linguistic description, at each stratum, has two aspects, one representing 
linguistic systems (paradigmatic axis), the other the structural realizations of these 
systems (syntagmatic axis). The paradigmatic axis is represented through system net- 
work, resembling a type hierarchy supporting multiple inheritance. 

According Halliday a system network is a theory of the language as a resource for 
realising meaning. A system represents a choice between possible semantic, lexico- 
grammatical or phonological alternatives. Systemic Functional Grammar is an ap- 
proach to natural language syntax, representing grammar as network of systems. In the 
process of sentence generation each system, responsible for a given aspect of meaning 
imposes specific constraints on the form of the sentence. In such a way the generation 
of a sentence is a satisfaction of a set of constraints, specified by a system network 
during its tracing. Thus SFG approach describes grammatical structures in terms of 
co-satisfaction of constraints 

The construction of an SFG grammar is led by three organisational principles: axi- 
ality, delicacy, and rank. Axiality expresses the relation between paradigmatic, func- 
tionally motivated features and syntagmatic structures realizing them and determines 
the way systems are formulated: a system has input conditions phrased in terms of 
grammatical features, and has as output grammatical features, which may be accom- 
panied by realization statements connecting specific constraints on the realization of 
the surface form to a particular feature. Delicacy is a principle organizing a grammar 
in a vertical manner, according to levels of specificity. Rank expresses a generalized 
form of a constituency hypothesis (a sentence can be divided into clauses, clauses into 
groups, groups or phrases into words, and words into morphemes). . A part of the 
system network, presenting the highest rank (lowest delicacy) systems for the English 
SFG Nigel, is shown on Fig.l. 

The cross-linguistic analysis in [3] leads to the following observations. 

• Languages tend to show more similarities on the more abstract strata of linguis- 

tic organisation than on the less abstract ones (i.e., they express similar mean- 
ings in different grammatical terms). 

• Languages tend to be similar on the paradigmatic axis and less similar in terms 

of syntagmatic realisation. 

• Systems of low delicacy tend to be similar across languages, and systems of 

higher delicacy tend to be dissimilar. 

• There may be different preferences in different languages concerning the gram- 

matical rank at which a particular meaning is expressed. 

Different languages may distribute functional responsibilities differently across 
metaflmctions. 
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Fig. 1. Part of the system network of the Nigel System Functional Grammar 

The cross-linguistic comparison of lexico-grammar features shows that Bulgarian is 
closer to English than the other Slavic languages with respect to some pragmatically 
important for the automatic generation syntactical features (lack of cases, explicit 
articles etc.), so this may facilitate the re-use of English resource for some phenomena. 
Naturally, Bulgarian differs from English on many other lexico-grammar features as 
well as on its richer morphology. 

The English grammar resource used as basis in the project AGILE is the Nigel 
grammar, mainly developed by Matthiessen on the SEE foundation [6] and extended 
by many people afterwards, resulting in a large-scale English computational grammar, 
that covers broad range of grammatical phenomena. The organization of Nigel sepa- 
rates specifications of syntactic structures from a description of their communicative 
functions. The analysis in [1] shows that the functional description varies less across 
languages than the syntactic description and, since the functional component of the 
description provides the overall organisation of the grammar, the SEE approach to 
language phenomena can serve as a general guideline for the grammatical description 
of a wide range of languages without enforcing artificial uniformity. The use and re- 
use of Nigel is supported by the grammar development environment and multilingual 
sentence generator KPML, which, like Nigel, is available free of charge. 



4 Method of NewResource Development 

The approach for developing of a new lexico-grammar resource by re-using an exist- 
ing resource for another language was implemented by using a Method of NewRe- 
source Development, which is sketched below. 
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Initial information for the method: 

1. BaseResource (the Nigel grammar); 

2. TargetExamplesSet - set of sentences, representative for the sublanguage re- 
vealed during corpus analysis. It covers the lexico-grammar features of the 
sublanguage. 

The method consists of two phases: Phase 1 “Constructing WorkResource” and 

Phase 2 “Modification of the WorkResource”. 

Phase 1 “Constructing WorkResource” 

Step 1. Tracing of the BaseResource with an example - sentence from the 

TargetExamplesSet. 

Step 2. All the systems from the BaseResource, used in the example tracing 
are added to the WorkResource. 

Step 3. Identification of all places in the system network of inappropriate 
lexico-grammar choice or gaps in the BaseResource, preventing the example 
generation. They are added in a GrammarProblemsList with a pointer to the 
corresponding example sentence. 

Step 4. Steps 1-3 are repeated for each sentence from the TargetExamplesSet. 
At the end of Phase 1 the following information structures are available: 

1 . WorkResource - a subset of the BaseResource system network; 

2. a list of grammar problems with test examples from the TargetExamplesSet. 

Phase 2 “Modification of the WorkResource”. 

Step 1 . Solving a problem from the GrammarProblemsList by modifications 
in appropriate system networks of the WorkResource. The solution is demon- 
strated by proper generation of the corresponding example sentence. 

Step 2. Adding the example sentence to GeneratedExamplesSet and check of 
the WorkResource with all the examples of this set. 

Step 3. Steps 1-2 are repeated for each problem from the GrammarProblems 
List. 

Step 4. Removal of all system components, representing unused lexico- 
grammar features from the WorkResource. 

It is possible during this phase to process also additional problems with corresponding 
test examples, inserted in the GrammarProblemsList in order to extend the sublan- 
guage and the NewResource coverage. 

The application of the method is illustrated below by three examples (the 
choices made during tracing the system with the examples are shown in bold italics). 

EXAMPLE 1 “Finite in imperative-lperson, plnral” 

The problem : In Bulgarian language the verb form in imperative is finite (in second 
person, plural for formal, "polite" style), while English imperative form is realised by 
nonfinite (stem). 

Test sentence : “BuBe/texe Koop^tHnaxHTe!” /enter[ imperative- 2p, pi] coordinates/ 
Phase 1 : The following system is connected with the problem: 
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MOOD -TYPE ( independent-clause- simplex) -> 

[indicative] Insert (Subj ect) , Insert (Finite) 

[imperative] Insert (Nonfinite) , 

Inf letify (Non finite, stem) 



Phase 2 

Step 1. Necessary modifications - to change the feature Nonfinite in Finite 
and to fix the verb form in second person plural. Thus the system MOOD-TYPE is 
modified as follows: 

MOOD -TYPE (independent-clause-simplex) -> 

[indicative] Insert (Subj ect) , Insert (Finite) 

[imperative] Insert (Finite) , 

Inf letify (Finite , secondperson-form) , 

Inf lectify (Finite, plural-form) 

In Bulgarian language an informal imperative verb form - second person singular is 
used also. Though it is not covered in the software manuals sublanguage, it may be 
included in the NewResource by means of additional system and the following modi- 
fication: 

MOOD -TYPE (independent -clause -simplex) -> 

[indicative] Insert (Subj ect) , Insert (Finite) 

[imperative] Insert (Finite) , 

Inf lectify (Finite, secondperson-form) 

IMPERATIVE -TYPE (imperative) -> 

[polite-imperative] Inf lectify (Finite, plural-form) 

[inf ormal - imperative] Inf lectify (Finite, singular- form) 

If the decision of extending the sublanguage is taken an additional test sentence has to 
be added: “Bubc^ih Koop^innaxHTe!” /enter[ imperative- 2p, sing.] coordinates/ 

IMPERATIVE -TYPE (imperative) -> 

[polite - imperative] Inf lectify (Finite, plural -form) 

[informal-imperative] Inf lectify (Finite, singular- form) 

Step 2. The test examples are added to the GeneratedExamplesSet and all its 
members are checked for generation against the modified WorkResource. 



EXAMPLE 2 Aspect 

The problem : Two aspect forms of the verbs are distinguished in Bulgarian language: 
a/ the imperfective forms emphasise the continuous, incomplete nature of the action; 
b/ the perfective forms refer to the action as a whole or focus on its completion. 

Test examples : “Br,Be.neTe Koop^innaxHTe!” /enter [ perfective, imperative- 2p, pi.] /, 
“BtBoiyiaHxe Koop^tHHaxnxe!” /enter [ imperfective, imperative- 2p, pi.] / 

Phase 1 : The BaseResource does not contain an analogue. The new system position is 
localised by the need to use as input the feature independent-clause-simplex. 
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Phase 2 : 

Step 1 . Inclusion in the WorkResource of the system 

ASPECT {independent-clause-simplex) -> 

[perfective] Classify (Process, perfective-verb) 
[imperfective] Classify (Process, imperfective-verb) 

Step 2. The test examples are added to the GeneratedExamplesSet and all its 
members are checked for generation against the modified WorkResource. If the Gen- 
eratedExamplesSet contains complex sentences with dependent clauses, they will not 
obtain Aspect variations because of the chosen input of the new system independent- 
clause-simplex. In such case it will be necessary to return to Step 1 and appropriately 
change the resource until all test examples from the GeneratedExamplesSet are gener- 
ated correctly. 

EXAMPLE 3 Phase 

The problem .- To generate clauses with complex verb forms, consisting of phase verb 
and main verb, allowing the presentation of different phases of a process. 

Test example .’ “ SanonueTe ga BUBOKjare Koop^iHuaxHTe!” /begin [ imperative- 2p, 
pi.] to enter[ 2p, pi.] coordinates/ (See Fig.2) 

Phase 1 : The following systems are connected with the problem: 

PHASE {transitivity-unit) -> 

[not-phase] 

[phase] Insert (Phase) , Classify (Phase, phase-verb). 
Insert ( Phasedependent) 

PHASEDEPENDENT-TYPE (phase) -> 

[phaseinfinitive] Inf lectify (Phasedependent, stem). 

Insert ( Tophase) , Lexi fy ( Tophase, to) , 
Order (Tophase, Phasedependent ) 

[ingphase] Inf lectify (Phasedependent, ingparticiple) 

Phase 2: 

Step 1 . The following analogy between the English and Bulgarian phase verb 
exists: the phaseinfinitive corresponds to “da-construction” in Bulgarian: Begin to 
enter the coordinates! <-> Sanounexe na BUBeJKnaxe KoopnHuaxHxe! 

/begin [ imperative- 2p, pi.] enter[ 2p, pi.] coordinates/ 
ingphase or ingparticiple-form may be matched with the construction with nomi- 
nalization in Bulgarian: Begin enterling the coordinates! <-> Sanouuexe BUBeJKnauexo 
na KoopnHuaxHxe ! 

/begin [ imperative- 2p, pi.] entering [ nominalization noun] coordinates/. 

In English the Phasedependent verb form in phaseinfinitive construction is stem, 
while in Bulgarian the Phasedependent verb form in “da-construction ” agrees with 
the Phase verb on person and number: Begin to enter the coordinates! <-> Sanouuexe 
na BUBeJKnaxe KoopnuuaxHxe! /begin [ imperative- 2p, pi.] fill [ 2p, pi.] coordi- 
nates/ 

Therefore the following modifications are made: 
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PHASE {transitivity-unit) -> 

[not-phase] 

[phase] Insert (Phase) , Classify (Phase, phase-verb). 
Insert (Phasedependent) 

PHASEDEPENDENT-TYPE {phase) -> 

[da-phase-construction ] 

Insert (Daphase) , Lexify(Daphase, da) , 

Order ( Daphase, Phasedependent ) , 

Agreement (Phase, Phasedependent , number- form) , 
Agreement (Phase, Phasedependent, person- form) 
[phase -nominal izat ion] 

Inf lectify (Phasedependent , nominalization-noun) 

A possible sublanguage extension is to add a phase construction with nominalisation 
(Sanounexe BT,Be>iytaHeTO ua KoopnHuaxHTe! /begin [ imperative- 2p, pi.] filling 
[ nominalization noun] coordinates/) . For this the corresponding target example has 
to be included in the TargetExamplesSet and the method is repeated from Phase 1 
with the new example, which would need additional systems from the BaseResource 
and additional problems in the GrammarProblemsList. 

Step 2: The test example “SanouHexe ;ia BUBeJKnaxe KoopnuuaxHxe!” is 
added to the GeneratedExamplesSet and all its members are checked for generation 
against the modified WorkResource. 




Fig. 2. Screenshot of the generated Example 3 sentence with 'da' - construction 



The screenshots on Fig 2. and Fig. 3 show the generated sentences from Example 3 
(with 'da' - construction' and nominalisation) together with the collected during the 
system network tracing grammatical features, constraining the generated structure. 
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Fig. 3. Screenshot of the generated Example 3 sentence with nominalisation 



5 Conclusions 

An intermediate version of the Bulgarian lexieo-grammar resource, developed up to 
now, covers exclusively procedural texts from the sublanguage of software manuals. It 
contains 110 new or modified systems, which is about 20% of the whole resource. The 
on-going work on further development of this resource is oriented towards presenta- 
tion also of descriptive texts, concerning text planning and lexieo-grammar problems 
like support of modal clauses, different types of clause aggregation in complex sen- 
tences etc. 
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Abstract. The paper describes the recognition of diphthong and 
triphone signs useful in automatic generation of English text from 
Pitman Shorthand Language (PSL) document. The PSL is used to note 
down the dictated/spoken text and is widely practiced in all 
organizations where English is the transaction medium. This has a built- 
in practical advantage because of which it is universally acknowledged. 
This recording medium will continue to exist in spite of considerable 
developments in Speech Processing Systems, which are not universally 
established yet. Because of wide usage of PSL and the effort in its 
automation, PSL processing has emerged as a potential problem for 
research in the areas of Pattern Recognition, Image Processing, 
Artificial Intelligence and Document Analysis. 

There are six long and six short vowels in PSL. These vowels are 
represented by signs /symbols that over ride a stroke symbol and 
require recognition for composing English text from phonetic text 
documented through PSL. The work pertaining to other constructs of 
PSL is already carried out at word level by the authors [7,8]. But during 
dictation, it is common that the vowels are joined to form one syllable, 
called Diphthong. The diphthong appended with a tick-mark is called 
Triphone and represents any vowel that immediately follows the 
diphthong in a stroke. The diphthongs are to be cognized and 
recognized to generate the correct English equivalent text. The present 
work comprises of the definition of diphthong primitives, creation of 
knowledge base and the development of an algorithm for their 
recognition. A suitable shape recognition algorithm is assumed 
available here. This work is new and this module serves as a 
prerequisite for the complete recognition of PSL document and 
generation of an equivalent and correct English text. 

Keywords: Diphthong, Triphone, Pitman Shorthand Language, 
Knowledge Base, English Text. 
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1 Introduction 

The PSL is used to note down the dictated/spoken text based on phonetic composition 
(PC) of words/sentences. It is widely practiced in all organizations where English is 
the transaction medium. This has built-in practical advantages, such as compressed 
mode of recording, phonetic based composition and recording speed that matches 
dictation, which have made PSL a universally acceptable medium of recording. This 
recording medium will continue to exist in spite of considerable developments in 
Speech Processing Systems, which are not universally established yet [4]. The PSL 
could also find the promising applications in the areas such as private and secret 
communication, compact storage of documents, speech to text conversion, adaptation 
of PSL to other languages, etc. These applications can be realized with the 
developments in both computer and communication technologies. Hence, the problem 
of automatic generation of complete English text, in the form of printed document, 
from the spoken/dictated text has emerged as a problem of recent research interest in 
the areas of Pattern Recognition, Image Processing, Artificial Intelligence and 
Document Analysis. 

Leedham,et.al. [2,3] and Hemanth Kumar & Nagabhushan[l] have addressed the 
problem of Automatic Recognition of Pitman Shorthand strokes with emphasis to 
shape recognition. The process of automatic text generation from PSL document is 
viewed as a 3-phase pattern recognition problem, namely, (i) Shape recognition, (ii) 
Syntactic specification and analysis and (iii) English text production, as shown in 

fig(l). 




Fig. 1. Three phases in Automatic Text Generation from PSL 

The last phase consists of generation of correct and an equivalent English text from 
PSL document. This is considered a challenging task because of the subtasks such as 
(i) Converting the phonetic text to English text, (ii) Correct and an equivalent word 
substitution, (iii) Resolving homophones, (iv) Domain specific context resolution, and 
(v) Handling of grammalogues, (vi) Grammatical corrections etc. The composition of 
English text (CET) from phonetic text documented through PSL is attempted as an 
initial work in this direction and a paper is communicated based on this by the 
authors [7]. In this paper we present the definitions for PSL stroke primitives, 
knowledge base representation and creation, the development of an algorithm for 
recognition of diphthongs and triphones that are encountered during dictation. The 
remaining issues of automatic text generation from PSL document are still under 
investigation. 
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2 Defnition of Primitives 

The shape recognition phase gives the primitives that describe the consonant strokes 
which are the basic characters in PSL. The primitives defined are given in Table(l). 



Table 1. PSL Consonants and Primitives 



Character 


Phonetic 

Name 


Primitives 
for strokes 


Stroke 

Nature 


English 

Consonant 


\ 


Pee 


(120 line) 


(thin) 


P 


\ 


Bee 


(120 line) 


(thick) 


B 






Tee 


(vertical line) 


(thin) 


T 






Dee 


(vertical line) 


(thick) 


D 


/ 


Chay 


(60_line) 


(thin) 


CH 


/ 


Jay 


(60_line) 


( thick) 


J 


— 


Kay 


(horizontal line) 


(thin) 


K 


— 


Gay 


(horizontal line) 


(thick) 


G 


1 


Ef 


(Irdownarc) 


(thin) 


E 


1 


Vee 


(Irdownarc) 


(thick) 


V 


( 


Ith 


(vertical arc If) 


(thin) 


TH 


( 


Thee 


(vertical arc If) 


(thick) 


TH 


) 


Ess 


(vertical arc rt) 


(thin) 


S 


) 


Zee 


(vertical arc rt) 


(thick) 


z 


J 


Ish 


(lr_up_arc) 


(thin) 


SH 


J 


Zhee 


(lr_up_arc) 


(thick) 


ZH 


n 


Em 


(horizontal arc up) 


(thin) 


M 


u 


En 


(horizontal arc down) 


(thin) 


N 


u 


Ing 


(horizontal arc down) 


(thick) 


NG 


r 


El 


(up_right_arc) 


(thin) 


E 


A / 


Ar, ray 


(upleftarc) 


(thin) 


R 


/ 


Way 


(bottom up hook) 


(thin) 


W 


/ 


jYay 


Bottom down hook) 


(thin) 


Y 


<1 


Hay 


(bottom circled hook 


(thin) 


H 



An algorithm is already devised for the recognition of these basic primitives and is 
assumed available here [1]. The strokes are written above or on or through the base 
line based on the sound of first vowel in the word, as shown in fig(2). The vowels 
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are classified into six long vowels, as heard in the words like wah!, ale, each, all, oak, 
ooz etc and six more short vowels, as heard in the words like at, etch, it, odd, pub, 
cook etc. The short vowels are represented by light dash 0 and dot ( . ) symbols and 
the long vowels are represented by thick dash 0 and dot ( . ) symbols. There are six 
vowel positions, three on either sides , on the stroke and are written before and after 
the stroke depending upon the nature of the vowels. This is depicted in fig(3). 

(Above) (on) (through) 



1 1 baseline 

(1) (2) (3) 



Fig. 2. Three Positions for the Strokes 



Before 1 I 1 



2 



2 



3 I 3 



After 



Fig. 3. Vowel positions 



3 Diphthongs and Triphones 

The diphthongs are the union of two vowel sounds in one syllable, quite commonly 
used during dictation. There are four diphthongs in PSL, namely , i, ow, oi, and u as 
heard in the sentence “ I now enjoy music”. The table(2) gives diphthong signs and 
the defined primitives. 



Table 2. Diphthong signs and primitives 



Diphthongs 


Contained 

Vowels 


Devised 

primitive 


Sign 


I 


I+E 


(V sign) 


V 


OW 


O+U 


(Invsign) 


A 


OI 


O+I 


(greatersign) 


> 


u 


U+E 


(Inusign) 


n 
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The signs for i and oi are written in the first place and the sign for ow and u are 
written in the third place, as depicted in fig(3). The diphthong signs are conveniently 
joined to the consonant symbols during recording in PSL. However, in this work it is 
assumed that the diphthong signs are written separately near the consonant. A novice 
during writing normally practices this. A small tick-mark appended to the diphthong 
sign represents any vowel that immediately follows the diphthong in a stroke. These 
signs are called triphones because they represent three vowels in one sign, refer fig(5). 
The examples of fig(4) illustrate the usage of diphthongs 







"1 


1 






hi 




(i) tie 


(ii) time 


(iii)cowed 


(iv) duty 



Fig. 4. Examples illustrating usage of diphthongs 



The usage of triphones in PSL is illustrated with the examples given in fig (5). The 
primitives for the triphone-signs are similar to those of diphthongs and their 
recognition conveys the possible occurrence of any vowel next on the stroke. These 
signs are small and the separation of the tick-mark from the symbol of a diphthong is 
a difficult task. Hence, the triphones are are assumed to be independent phonetic units 
in PSL and the separate primitives defined are given in table(3) . Another sign, right 
semicircle (cz ), is used to mark the initial sound of w in PSL. The frame structure 
representation is used for storing the knowledge of both Diphthong and Triphone 
symbols. 









(i) Diary 


(ii) Loyal 


(Hi) Towel (iv) Fewer 



Fig. 5. Examples illustrating usage of Triphones 



The format of the knowledge structure is given in fig(6). The hashing technique is 
employed for the implementation of Knowledge bases namely. Diphthongs’ 
Knowledge Base (DKB) and Triphones’ Knowledge Base (TKB) owing to their 
smaller sizes. 
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(Frame-name( Primitive( Value, value-string)) 

(Eng-equivalent(Value, value-string)) 



Fig. 6. Frame structure 

Rule Base : The following rules are used in handling Diphthongs. 

Rulel. If Diphthong is before the stroke then 

Substitute the phonetic composition before the stroke details 
And obtain the corresponding English text composition. 

Rulel. If Diphthong is after the stroke then 

Substitute the phonetic composition before the stroke details 
And obtain the corresponding English text composition. 

Rule3. If there is right semi circle then 

The possible occurrences of strokes are kay,gay,em and ar 
Substitute w before the stroke details and obtain the 
corresponding English text composition. 

Ruled. If w is preceded by a vowel then 

Substitute the vowel before the stroke details and obtain 
phonetic composition and further obtain corresponding English 
text composition. 



Table 3. Triphone Primitives 



Triphone Symbols 


Primitives 


V 


(ticked_Vsiqn) 


K 


(tickedJnVsign) 


2 


ticked_greater_sign) 


rv 


(ticked Usian) 
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4 Algorithm Description 

The proposed algorithm recognizes the Diphthongs and Triphones and substitutes an 
equivalent English text based on their phonetic composition. This algorithm should 
go in conjunction with other algorithms that are already developed and published 
[7,8]. The length of every phonetic construct is obtained in terms of its primitives 
during segmentation. It is observed that the length of diphthong is one and the length 
of triphone is two. 



Input: The diphthongs , Triphones and their length . 

Output: The English Equivalent text. 



Stage 1. Creation of the Diphthongs and Triphones Knowledge bases. 

Stepl. The Diphthong and Triphone Tables are established. 

Step2. The DKB and TKB frames are suitably represented. 

Step3. Implement the Knowledge bases using hashing technique. 

Stage 2. Recognition of Diphthongs and Triphones. 

Stepl. Accept the primitives and length information from word level 
segmentation. 

Step2. Obtain the hash key for the primitive. 

Step3. If (the length of phonetic construct =1) then 

access the DKB and obtain corresponding phonetic composition 
and go to step 4. 

Step4. If (the length of phonetic construct = 2) then 

access the TKB and obtain corresponding phonetic composition. 

Step5. Generate corresponding English text composition. 

Step6. With reference to the rule base substitute the English text in the resulting 
word. 



5 Results and Discussion 

The proposed algorithm is tested extensively on all possible Diphthongs and 
Triphones The following examples illustrate the process given in the algorithm. 

Example of Diphthong : 




Stroke Details : 

Phonetic Composition: 
English Equivalent : 
Probable Word : 

Exact Word : 



Stroke+Diphthong + Stroke 
Tee +01 + EL 

T + 01 + L 

TOIL 
TOIL 
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Example of Triphone: 




Stroke Details : 

Phonetic Composition: 
English Equivalent : 
Probable Word : 

Exact Word : 



Stroke+T riphone+Stroke+Vo wel 
Dee + i + ray + i 
D + I + R + I 
DIRI 
Diary 



The table (4) gives few examples illustrating the usage of diphthongs in PSL. 



Table 4. Examples illustrating usage of Diphthongs 



Stroke details 


Phonetic 

Composition 


English 

Equivalent 


Probable 

Word 


Exact word 




TEE+I 


T-H 


TI 


TIE 




TEE-I-I+EM 


T-l-I+M 


TIM 


TIME 




I+TEE-tA.+EM 


H-T-tA+M 


ITAM 


ITEM 




I+DEE+EL+E 


H-D +L+£ 


IDLE 


IDLE 



The triphones are also processed by accessing knowledge base (TKB) similar to 
Diphthongs except that the presence of tick-mark gives information about vowel 
following the diphthong which acts as a priori knowledge for further processing of 
primitives. The following table (5) few examples for illustration. 

Table 5. Examples illustrating usage of Triphone 



r 


EL+OI *K + EL 


L + OI +E+L 


I.OIEl 


LOYAL 


\ - ^ 
V- 


TEEMm -E+EI 


T-KIW-E-EI. 


fOWEI. 


TOWEL 


j 


U'ay+Ka>+e 


WH-K t E 


WKE 


WAKE 



6 Conclusion 

The work of automatic generation of an equivalent English text from PSL document 
comprises of different phases. During dictation, the usage of diphthongs is common 
and hence their recognition becomes important and forms the basis for the further 
work on the proposed research. The conflicts that arise in substitution of words can be 
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overcome with the domain and the contextual knowledge. The identified subsequent 
phases require the intelligent dictionary support, conflict resolution technique in case 
of homophones, techniques for handling PSL dialects, etc. These issues are quite 
challenging and our present research is in these areas. The total automation requires 
the definition of dictionaries for making the correct substitution. This work is 
currently under investigation. 
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Abstract The Davis and Putnam (D&P) scheme has been intensively 
studied during this last decade. Nowadays, its good empirical perfor- 
mances are well-known. Here, we deal with its theoretical side which has 
been relatively less studied until now. Thus, we propose a strictely lin- 
ear D&P algorithm for the most well known tractable classes: Horn-SAT 
and 2-SAT. Specifically, the strictely linearity of our proposed D&P algo- 
rithm improves significantly the previous existing complexities that were 
quadratic for Horn-SAT and even exponential for 2-SAT. As a conse- 
quence, the D&P algorithm designed to deal with the general SAT prob- 
lem runs as fast (in terms of complexity) as the specialised algorithms 
designed to work exclusively with a specific tractable SAT subclass. 

Keywords: Automated Reasoning, Computational Complexity, Search, 
Theorem Proving. 



1 Introduction 

Since the beginning of the current decade [8,1], the widely well known scheme of 
Davis and Putnam (D&P) [-5], whose most appropriate algorithmic description 
was given in [4], has proved to be faster than many other elaborated schemes. 

Throughout this decade, algorithms with the D&P’s scheme were empiri- 
cally compared to other competitive algorithms with success. Thus, this scheme 
was extensively used for analysing the transition phase phenomenom [3,11] that 
emerges when solving SAT instances randomly generated. Moreover, during 
these last years the D&P’s mechanism has been essential in the study of heuris- 
tics [12,10,2] for propositional theorem proving. Furthermore, finding high per- 
formance algorithms for some real-life applications, e.g. [13], has relied on the 
famous algorithmic scheme as well. 

We may also find several proposals [3,17,14,16,15] of different implementa- 
tions of algorithms stemming from the D&P principle. These implementations 
based on suitable data structure may enable us to scan fastly the search space. 

In this article, we propose a data structure for the D&P’s scheme and a new 
inference rule which allow us to claim that the Davis and Putnam method is 
strictly linear for the Horn-Sat and 2-SAT sub-classes. Thus, we push beyond the 
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currently known efficiency of the Davis and Putnam method, since only quadratic 
and even exponential complexities had been obtained for such subclasses [14,6] 
until now. 

Therefore, our goal in this article is three-fold and it concerns the efficiency 
of the D&P principle: 

1. To propose data structures which may enable to cross rapidly the search 
space. 

2. To introduce a new inference rule, called polarised formula, that prunes large 
search spaces. 

3. To prove indeed, that D&P is even as fast (in terms of complexity) as the 
specialised algorithms designed to deal only with a specific tractable class 
(e.g. Horn, 2-Sat) of instances. 

The proposed data structures are not complex ones and they are indeed based 
on classical data structures such as flags, counters, pointers and lists. 

The organisation of the article is as follows. The next section presents the 
classical notions of the SAT problem and those of the D&P scheme. Afterwards, 
in section 3, an informal description of the proposed algorithm is given. Section 
4 specifies the first D&P procedure. After, the changes in the algorithm, in order 
to select unit clauses first, are described. In section 6, a new inference rule for 
the D&P method is defined. Finally, it is claimed that the algorithm stemmed 
from the D&P scheme is linear for several well-known tractable classes. 

Due to a lack of space, the proofs of the theorems have been omitted. More 
details about the contain of this article can be found in [9]. 

2 Preliminaries 

Let us recall the bases and the classical terminology associated with the SAT 
problem and the D&P algorithmic principle. 

Basic Terminology. The number of different propositions is assumed to 
be n. A positive literal A is a proposition p and a negative literal is a comple- 
mented proposition —p. The complemented literal of L is noted —L. A clause, 
noted C, is a set of literals which may be empty. A formula, noted F, is a set of 
clauses which may be empty. 

Satisfiability. An interpretation I assigns to each literal a value in {0, 1} 
and verifies I{p) = 1 — I{—p). An interpretation satisfies a clause iff it assigns 1 
to at least one of its literals. An interpretation satisfies a formula iff it satisfies 
all of its clauses and in this case, the interpretation is called a model. A formula 
is satisfiable iff there exists at least one interpretation that satisfies it. 

Partial interpretation. An interpretation that maps only 0 < k < n propo- 
sitions satisfies a certain sub-set of clauses of a formula. If the partial interpre- 
tation satisfies all the clauses is called a partial model. All the interpretations 
covering this partial model are also models. If a partial interpretation unsatisfies 
all the literals of at least one clause then all the interpretations covering this 
partial interpretation unsatisfy the formula in question too. 
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D&:P Scheme. Each state of the D&P algorithm is associated with a set 
of k < n literals. A set can include either a literal or its complement but not 
both. Each set is associated with the Current Partial Interpretation {CPI) that 
satisfies each literal into the set. A straightforward version of the D&P solver 
is depicted here below. The procedure (Inference P p) returns a new formula 
obtained from the original one by removing from it the clauses containing p and 
the occurrences of —p. 

Algorithm 1 Algorithm: D&P scheme. TTie function pick.literal selects a 
literal L from the formula P that is not in CPI. 

D&p(r, CPI) 

1 If T = {} then HALT (sat) 

2 If {} G T then Return (unsat) 

3 T <— pick.literal 

4 D&P ( (Inference P p) , CPI U {p}) 

5 Return(D&P( (Inference P —p), CPI U {~p})) 

Theorem 1. D&P is correct: D&P(T,{}^ returns sat iff P is satisfiable. 

Remark. The rules of clause subsumption and pure literal elimination are 
not considered. Both rules involve a high computational cost and are rarely 
useful in practice. 

D&P algorithm. We will distinguish between a D&P algorithm and the 
D&P scheme as follows. The D&P scheme description above omits the data 
structure employed to represent P, CPI and also the specific instructions in 
pseudo-code. Thus, a D&P scheme where the data structure and its computer 
instructions are completely specified is called a D&P algorithm. 

Remark. The following two statements are equivalent: 

(1) The complexity of the D&P scheme is in 0{f{n)) and, 

(2) The best complexity of a D&P algorithm is in 0{f{n)). 

Thus, in the sequel we use statements of type (1). 

As mentioned, each state of the search space can be associated with the 
current partial interpretation CPI. This is formed with the literals which are 
added incrementally in each recursive call. 

The basic data structure consists of: 

1. For each literal L: 1) clauses{L) is the set of clauses including L and, 2) 
new.sat.clauses(L) is the subset of clauses{L) not satisfied by the CPI. 

2. For each clause C: state{C) = sat iff C is satisfied by the CPI. 

3. For the formula P: counter(T) indicates how many clauses are unsatisfied 
by the CPI. 

3 Informal Description of the Algorithm 

Bearing in mind both the first general description of the D&P scheme and the 
described data structure, we can informally describe our specific algorithm as 
follows: 
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— Steps 1, 2 and 3. They are straightforwardly implemented. 

— Step 4- The function (Inference F p) is accomplished in two steps 4. A and 
4.B. These steps are related respectively to the unit-subsumption and to the 
unit-resolution inferences. 

— Step 4 . A: Unit-subsumption. After a literal L is picked up in step 3, all the 
clauses C unsatisfied by the CPI but containing L are marked satisfied, 
namely stateiC) = sat. The counter of F is decremented as many times as 
new clauses are satisfied. Whenever this counter is set to zero it means that 
all the clauses are satisfied and thus, the algorithm halts sending “Sat”. 

— Step 4-B: Unit-resolution. When counter(F) yf 0 the process is continued by 
decrementing the counters corresponding to clauses having occurrences of 
—L. If no counter is set to zero then, another proposition is selected and the 
steps (1) — (4) are iterated (this is done calling recursively the main function 
D&P). If one of them is set to zero the CPI unsatisfies the formula and 
hence the algorithm stops the search beyond the CPI and backtracks. This 
implies that the operations done in the last step 4 must be undone. Thus, 
for each C € sat. new. clauses, State(C) is set again to unsat and counter(T) 
is incremented, and finally sat. new. clauses is set to 0. These operations 
are incrementally continued till returning to the last pending recursive call 
corresponding to a step 5. 

— Step 5. The process follows the search for models containing CPI U {—L}. 
The operations in step 5 are equal to those of step 4 as long as L is exchanged 
by -L. 

4 A Basic Algorithm Issued from the D&P Scheme 

First, we present the four procedures which form the skeleton of our proposed al- 
gorithm. Unit-subsumption (resp. Unit-Resolution) is implemented by the 
procedure called Remove- clauses (resp. Remove-literals) and in the back- 
tracking process its steps are undone by the procedure Restore-clauses (resp. 
Restore- literals). This first algorithmic version intends to help the reader to 
understand the more elaborate and definitive complete algorithm which will be 
given later. 

Procedure 1 Remove. clauses(L) It removes from F the clauses satisfied by L 
by decrementing the counter(F), once per each clause not satisfied by the current 
CPI and satisfied by L. If this counter is set to zero, the procedure halts the whole 
satisfiability test process and it returns “sat”. 

Remove. clauses(L) Restore. clauses (L) 

new. sat . clauses (L) <— 0 V C G new . sat . clauses (L) do: 

VC € Clauses{L) s.t. state (C)=unsat do: state (C) unsat 

Add C to new. sat . clauses (L) Increment counter ('T.) 

state (C) sat new. sat . clauses (L) <— 0 

Decrement counter ('T.) End 

If counter (W .1=0 then HALT(sat) 



End 
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Procedure 2 Restore. clauses(L). It undoes the operations carried out by the 
procedure Remove. clauses(L). 

Procedure 3 Remove. Literals(L) . It removes all the occurrences of L. If at 
least one clause becomes empty then the CPI unsatisfies all the literals in such 
clause and thus, the boolean flag UNSAT is set to True. Otherwise, the procedure 
ends with the flag UNSAT=False. 

Remove. literals(L) 

UN SAT ^ False 
V C € Clauses (L) do: 

Decrement Counter (C) 

If Counter (C)=0 then UNSAT 

End 

Procedure 4 Restore. literals(L) . It undoes the operations performed in the 
procedure Remove. literals(L). 

Now using these procedures we can construct our first D&P algorithm. 

Algorithm 2 Preliminary D&P algorithm. Pick, literal selects a literal from 
the current F , namely the formula that results after applying consecutively (In- 
ference F L), for each literal L in CPI. In other words, it selects a literal such 
that CPI(L)=CPI(-L)=Not. Its definition is straightforward and therefore, it will 
be omitted. Similarly, the initialisation of the data structure is not given here. 

D&P 

L ^ pick-Literal 

Remove . clauses (L) , Remove . literals (-L) 

If UNSAT = False then D&P 

Restore . clauses (L) , Restore . literals (-L) 

Remove . clauses (-L) , Remove . literals (L) 

If UNSAT=False then D&P 

Restore . clauses (-L) , Restore . literals (L) 

Return(unsat) 

End 

Remark. Notice that this algorithm is exactly the same as the previous one: 
the function (Inference F p) is materialised by both procedures Remove. clauses 
and Remove. literals. 

Theorem 2. D&P’s correctness. D&P returns unsat iff F is unsatisfiable. 

The proof is straightforward from the definition of the procedures 1 to 4 and 
theorem 1. 

The last version of the D&P procedure can be simplified integrating the op- 
erations in Remove-clauses(L) and Remove-literals(-L) in one procedure that 
we shall call Inference(L) and use from now on. Similarly, Restore-clauses 



Restore. literals(L) 

V C € Clauses (L) do: 
Increment Counter (C) 
End 

True 
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and Restore-literals are merged in Undo-Inference(L). We can modify slightly 
Remove-literals in a way that it retims unsat instead of using the previous flag 
UNSAT. Thus, we have: 

Remove. literals(L) D&P 

UNSAT <— False L <— pick-Literal 

V C G Clauses (L) do: If Inference (L)fyunsat do D&P 

Decrement Counter (C) Undo-Inf erence (L) 

If Counter (C)=0 then UNSAT ^ True 

If Inference(-L)fyunsat do D&P 
If UNSAT return (unsat) Else return(sat) Undo-Inf erence (-L) 

End Return (unsat) 

End 

5 Selecting Unit Clauses 

As it is well known, the rapidity of the D&P scheme increases if one chooses 
literals from unitary clauses (L) in step 3. The intuitive reason is that the sub- 
sequent search with the CPI branch corresponding to the complemented literal 
CPI U {—L} is trivially unsatisflable and therefore it is not executed. 

The basic idea is to take a literal from a unit clause and remove all of its 
complemented occurrences from the formula. These literal removals could give 
rise to new unit clauses. A direct generalisation of this principle is as follows: 
Repeat unit clause selection and its subsequent removals of its complemented 
literals and end when no new unit clauses are generated; if an empty clause is 
produced indicate unsatisflability. 

Efficient implementations of this strategy, called unit propagation can be 
found in [7,3]. In [17,16] a somewhat different principle is suggested which is 
claimed to improve the one proposed in [7,3] . 

In order to embed properly the Unit-propagation procedure in our D&P, we 
add two data structures: CPI(L) whose function is CPI{L) = Fes iff L € CPI 
and a list called Computed. units, containing the list of emerged literals in unit 
clauses throughout the Unit. propagation process. 

Procedure 5 Unit-propagation Computed. units is a local variable meanwhile 
unit is a set of literals initialised in the initialisation procedure (once at the 
begining) and in D&P (in each recursive call o/D&P ). Inference(L) (resp. Undo- 
Inference(L)) includes the instruction: CPI(L) <— Yes (resp and CPI(L) <— Not). 

Unit. Propagation 

flag <— SAT; Computed. units 0 
While unit fy0 and flag=SAT do: 

L pop(unit); Inf erence (L) ; Add L to Computed. units 
If flag fy SAT then do: 

VT, L € computed. units do: Undo-Inf erence (L) 

Ret (unsat) 

Else Return(Computed. units) 

End 
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Theorem 3. 



Literal Soundness If L is pushed into unit then F \= L. 



Theorem 4. Soundness. If Unit-propagation returns unsat then F is unsatis- 
fiable. 

Theorem 5. Literal completeness. F \= L iff L is pushed in unit. 

Theorem 6. Completeness. If F is a Horn unsatisfiable instance then Unit- 
propagation returns Unsat. 



Theorem 7. Correctness. If F is a Horn instance then Unit.propagation re- 
turns Unsat iff F is unsatisfiable. 

Theorem 8. Linear Complexity. Unit.propagation ends in 0(size(T)) time. 
Next, we detail the D&P algorithm improved with the unit-propagation strategy. 



Algorithm 3 D&P algorithm. Computed. units is a local variable. The pro- 
cedure Inference(L) (resp. Undo-Inference) set CPI(L) to Yes, push L into unit 
and call successively Remove. Clauses(L) and Remove. Literals(-L) (resp. undo 
all these operations). 



Inference(L) 

CPI(L) V- Yes 
push (unit ,L) 

Remove-clauses (L) 
return(Remove-literals(-L) ) 
End 



Undo-Inference(L) 

CPI(L) V- Not 
Restore-clauses(L) 
Restore-literals (-L) 
End 



D&P 

L <— Pick.literal 

If Inference (L) unsat do: 

Computed. units ^ Unit-propagation 
If computed. units unsat do: D&P 
yU'GComputed.units do: Undo-Inference (L ’ ) 
Undo-Inf erencia(L) 

If Inf erence(-L) unsat do: 

Computed. units ^ Unit-propagation 
If computed. units unsat do: D&P 
y L' G C omputed.units do: Undo-Inference (L’) 
Undo-Inf erencia(-L) 

Return (unsat) 

End 



It is well known that the integration of Unit.propagation procedure in the 
D&P scheme is capital to get a good complexity for Horn instances. This will 
be dealt with in the last section. 

Theorem 9. The D&P algorithm above returns unsat iff F is unsatisfiable. 

The proof follows from the theorem 2 (previous D&P correctness), the theo- 
rems 3 to 6 (Unit-propagation correctness) and the description of the algorithm 
above. 
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6 Detection of Polarised Formulas 

In this section, we introduce some notions to speed up the satisfiability test with 
a new inference rule. 

Definition 1. Polarised formulas We say that a formula has positive (resp. 
negative) polarity if each clause has at least one positive (negative) literal. For- 
mulas with polarity will be called polarised formulas. 

Corollary 1. Polarised satisfiability. A polarised formula is trivially satisfi- 
able. 

Indeed, a model is obtained by assigning 0 (resp. 1) to each propositional 
variable of a negative (resp. positive) polarised formula. Thus, in front of a po- 
larised formula, we can save a large deal of running time by avoiding subsequent 
splitting rules till non-empty satisfied clauses are obtained. Next, we propose 
some data structure and algorithmic operations to detect polarised formulas. 
We prove that this detection is performed in constant time 0(1) and hence a 
significant improvement is achieved in testing the satisfiability of formulas. 

The previous counter counter(C) of literals of a clause C is substituted by 
two counters. Similarly for the formula, counter(T) is now separated into two 
counters. Thus, we have the following data structure: 

- For each clause C: pos.counter(C) (resp. neg. count er(C)) indicates the 
number of positive (resp. negative) literals in C not satisfied by the CPI. 

- For the formula F: neg. counter {F) and po s. counter (F) indicate respectively 
the number of positive and negative clauses unsatisfied by the CPI. 

The updated Remove. clauses(L) and Remove. literals(L) are: 

Remove. clauses(L) 

New. sat . clauses v- 0 
VC S clauses{L) do: 

If state (C)=unsat then do: 

Add C to new . sat . clauses (L) 
state (C) <— sat 

If neg. counter (0=0 do: decrement pos . counter (O 
If pos . counter (0=0 do: decrement neg. counter (O 
If pos . counter (0=0 or neg. counter (0=0 then HALT (Sat) 

End 

Remove. literals(L) 

UNSAT V- False 
VC S clauses{L) do: 

If L=p do: deer, pos . counter (C) 

If L=-p do: deer. neg. counter (C) 

If pos . counter (0=0 and neg.counter(C)>0 do: 
increment neg. counter (O 
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If pos . counter (C) >0 and neg. counter (C)=0 do: 
increment pos . counter (I^) 

If pos . counter (0=0 and neg.counter(C)=0 do: UNSAT <— True 
If pos . counter (C) + neg. counter (C)=l do: 

Search for L' s.t. CPI(L’ )=CPI(-L’ )=Not 
push(L’ ,unit) 

If UNSAT return (unsat) Else return(sat) 

End 

Similarly, the corresponding procedures Restore. Clauses(L) and Re- 
store. Clauses(L) are easily adapted according to the new operations integrated in 
the two procedures Remove. Together with this, the new data structures, namely 
neg.counter(C), pos.counter(C), pos.counter(T) and neg.counter(T), need to be 
appropriately initialised. 

Algorithm 4 The definitive D&P algorithm is as the previous one defined in 
section 6, simply substituting the functions in section 4 by those described here 
above. 

Theorem 10. D&P correctness. The algorithm 4 returns unsat iff T is un- 
satisfiable. 

7 Some Worst-Case Complexity of the D&P Scheme 

Henceforth, we shall write “D&P” instead of “the definitive D&P algorithm 4” . 

The polarised rule inference is capital to perform the following complexity 
behaviours. 

Theorem 11. Horn Instances. D&P is strictly linear for Horn instances. 
Theorem 12. 2-SAT instances. D&P is strictly linear for the 2-SAT problem. 

Proof sketch. The following facts are at the nucleus of the theorem proof: 

(1) When a literal L is selected then Unit.propagation(L) is executed. 

(2) Unit.propagation(L) ends when no unit resolution is applicable. 

(3) Unit.propagation(L) stops after having removed some binary clauses. 

(4) The original formula is satisfiable iff the remaining set of binary clauses is 
satisfiable. 

(5) There is at most one backtracking point corresponding to the branch -L. 

(6) The total running time is at most proportional to 2.size{T). 

8 Conclusion 

In this article, we have studied deeply the D&P scheme in order to design an 
efficient algorithm for the Satisfiability problem. Our main results have been: 
(1) To design an appropriate non-complex data structure to perform efficiently 
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the inferences; (2) to furnish a new sound inference rule called Polarised For- 
mula; (3) to analyse several worst case behaviours of the proposed algorithm 
and; (4) to demonstrate that an algorithm stemmed from the D&P scheme can 
run, on certain tractable instances of high interest in practical applications, as 
fast as the published algorithms specially designed for dealing with only such 
tractable classes. Thus, we enhance the theoretical virtues of the D&P method 
for propositional theorem proving. 
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Abstract. Many hard practical problems such as Time Tabling and 
Scheduling can be formulated as Constraint Satisfaction Problems. For 
these CSPs, powerful problem-solving methods are available. However, 
in practice, the problem definition may change over time. Each separate 
change may invoke a new CSP formulation. The resulting sequence of 
CSPs is denoted as a Dynamic CSP. 

A valid solution of one CSP in the sequence need not be a solution of 
the next CSP. Hence it might be necessary to solve every CSP in the 
sequence forming a DCSP. Successive solutions of the sequence of CSPs 
can differ quite considerably. In practical environments large differences 
between successive solutions are often undesirable. To cope with this hin- 
drance, the paper proposes a repair-based algorithm, i.e., a Local Search 
algorithm that systematically searches the neighborhood of an infringed 
solution to find a new nearby solution. The algorithm combines local 
search with constraint propagation to reduce its time complexity. 



1 Introduction 

Many hard practical problems such as Time Tabling and Scheduling can success- 
fully be solved by formulating these problems as Constraint Satisfaction Prob- 
lems. This success can be contributed to three factors: (i) formulating a problem 
as a CSP does not force us to relax any of the problem requirements; (ii) there 
are efficient general-purpose solution methods available for CSPs; (iii) known 
mathematical properties of the problem requirements can be used to speed up 
the search process by pruning the search space. 

Despite of this success, using a CSP formulation in a dynamic environment is 
not without disadvantages. The main obstacle is that in a dynamic environment, 
the problem definition may change over time. For instance, machines may break 
down, employees can become ill and the earliest delivery time of material may 
change. Hence, a solution of a CSP need no longer be valid after a condition of the 
problem definition has been changed. To describe these changing CSPs, Dechter 
and Dechter [2] introduced the notion of Dynamic CSPs (DCSP). A DCSP can 
be viewed as a sequence of static CSPs, describing different situations over time. 
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In a DCSP, an operational solution may be infringed when a new situation 
arises. So, we have a problem statement and a complete value assignment to the 
variables for which some constraints are now violated. 

When a previously correct solution is no longer valid, a new solution must be 
generated. The most simple way to do this is by generating a new solution from 
scratch. This is however, not always what we want. First, we may have to repeat 
a large amount of work spent to solve the previous CSP. Several proposals to 
handle this problem have been made [1, 5, 7, 8]. Second, a meaningful obstacle is 
that a new solution might be completely different from the previous one. If, for 
example, the problem at hand is scheduling of people working in a hospital, then 
such an approach may result in changing all the night shifts, weekend shifts and 
corresponding compensation days of all employees just because someone became 
ill. Since people must also be able to plan their private lives, this will definitely 
result in commotion among the unpleased employees. 

Instead of creating a whole new solution, we should try to repair the infringed 
solution. Here, the goal is to move from a candidate solution that violates some 
constraints because of an unforeseen incident, to a nearby solution that meets the 
constraints. What is considered nearby may depend on the application domain. 

Whether a new solution is nearby the infringed solution is determined by the 
cost of changing to the new solution. In our experiments, the cost of changing 
to a new solution is determined by the number of variables that are assigned a 
new value. Notice that the presented algorithm is also suited for other ways of 
assigning costs. For instance, we could use the weighted distances between the 
new and the old values assigned to each variable, the importance of the variables 
that must change their value, and so on. 

Verfaillie and Schiex [9] have proposed a solution method for DCSPs that ex- 
plores the neighborhood of a previously correct solution. They start by unassign- 
ing one variable for each violated constraint. On the set of unassigned variables, 
they subsequently apply a constructive CSP solution method, such as backtrack- 
ing search with forward checking. This solution method may make new assign- 
ments that are inconsistent with the still assigned variables. When this happens, 
they repeat the solution method: for each violated constraint one of the still as- 
signed variables will be unassigned. Clearly, this approach may choose the wrong 
variables to be unassigned, resulting in a sub-optimal solution with respect to 
the distance to the starting point, the infringed solution. 

Local Search is a solution method that moves from one candidate solution to 
a nearby candidate solution [4] . Unfortunately, there is no guarantee that it will 
find the most nearby solution. In fact Local Search may also wander off in the 
wrong direction. Another problem with using Local Search is the speed of the 
search process. Local Search does not use something like constraint propagation 
to reduce the size of the search space. The search space is completely determined 
by the number of variables and the number of values that can be assigned to the 
variables. 

What we wish to have is a solution method that can systematically search 
the neighborhood of the infringed solution, taking advantage of the powerful 
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constraint propagation methods. In this paper we will show that such a repair- 
based approach is possible. The proposed approach combines Local Search with 
Constraint Propagation. 

The remainder of this paper is organized as follows. Section 2 defines the 
Constraint Satisfaction Problem. Section 3 presents a repair-based approach for 
CSPs and Section 4 specifies the algorithm. Section 5 provides some formal 
results and Section 6 describes the experimental results. Section 7 discusses 
related work and Section 8 concludes the paper. 

2 Preliminaries 

We consider a CSP consisting of (1) a set of variables, (2) for each variable a set 
of domain values that can be assigned to the variable, and (3) a set of constraints 
over the variables. 

Definition 1. A constraint satisfaction problem is a triple (V,I?,C). 

— V = {v\, ...Vn} is a set of variables. 

— T> = {Dyj^, Dy^} is a set of domains. 

— C is a set of constraints. A constraint 

Cvi^,...,vi^ ■ Dy.^ X ... X Dy.^ -> {true, false} 

is a mapping to true or false for an instance of Dy^_^ x ... x Dy.^ . 

We can assign values to the variables of a CSP. 

Definition 2. An assignment a : V ^ [JV for a CSP (V,'D,C) is a function 
that assigns values to variables. For each variable v € V: a(y) G Dy. 

We are of course interested in assignments that are solutions for a CSP. 

Definition 3. Let (V,'D,C) be a CSP and let a : V [JV be an assignment, 
a is a solution for the CSP if and only if for each Cy.^^,,,^y^^ G C: 

Cyi^,...,vi^(a{vi^), ...,a{vi^)) = true. 

3 Ideas behind Repair-Based CSP Solving 

The problem we have to solve is the following. We have a CSP {V,'D,C) and an 
assignment a that assigns values to all variables. The assignment used to satisfy 
all constraints, but because of an incident some constraints have changed or new 
constraints have been added. As a result the assignment a no longer satisfies all 
constraints. Now we have to find a new assignment satisfying the constraints. 
The cost of changing the infringed solution to the new solution must, however, 
be minimal. We assume that the cost of changing to a new solution can be 
expressed as the sum of the costs of changing a single variable. Moreover, we 
assume that these costs are non-negative. The cost of no change is of course 0. 
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Since some of the constraints are violated, at least some of the variables 
involved in these constraints must change their values. To illustrate the idea of 
repair-based CSP solving, let us assume that a unary constraint has been added 
and that the value assigned to the variable v is not consistent with this new 
constraint. Clearly, we must find a new assignment for v. If there exists a value 
in the domain of v that is consistent with the new constraint and with all other 
constraints of the CSP, then we can assign this value to v. However as Murphy’s 
law predicts: this will often not be the case. So, what can we do? 

The goal is to find the most nearby solution such that all constraints are 
satisfied. This requires a systematic exploration of the search space. If we fail 
to find a solution by changing the assignment of one variable, we must look 
at changing the assignment of two variables. If this also fails, we go to three 
variables, and so on. Hence, we must search the neighborhood of the variable v 
using an iterative deepening strategy. 

Let us return to the general case. Let X be the set of variables involved in 
the constraints that do not hold. 



Obviously, at least one of the variables in X must get a new value. So, we can 
start investigating whether changing the assignment of one variable in X enables 
us to satisfy all constraints. If no such variable can be found then there are two 
possibilities. 

— At least one of the other variables in X must also be assigned another value. 

This is the case if 



Hence, we must investigate changing the assignment of two variables in X . 

— There is an assignment to a variable v in X such that all constraints between 
the variables of X are satisfied, but for which a constraint with 

V G {vi-^, ...,Vi^} and ...,1;^^} ^ X, is not satisfied. 

In both cases we must try to find a solution by changing the assignment of two 
variables. In the former case it is sufficient to consider the variables in X for 
this purpose. In the latter case we must also consider the neighboring variables 
of X. The reason is that for any assignment satisfying the constraints over X, 
there is a constraint over variables in X and variables in V — A that does not 
hold. We can determine the variables in V — A after assigning a variable in A 
a new value, by recalculating A and subsequently removing the variables that 
have been assigned a new value. 

We can conclude that to find a nearby solution, we should assign new values 
to variables of a set A. The number of variables that should be assigned a new 
value is increased gradually until we find a solution. Furthermore, the set of 
variables A that are candidates for change is determined dynamically. 
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Reducing the time complexity Considering several sets of variables to which 
we are going to assign a new value is a first source of extra overhead. If we are 
going to change the values of n variables from a set X containing m candidate 
variables, we need to consider (™) different subsets of X where each subset is 
a separate CSP. Since this number can be exponential in m (depending on the 
ratio between m and n) the number of CSPs we may have to solve is 0(2”*), 
where each CSP has a worst time complexity that is exponential. This would 
make it impossible to use a repair-based approach. 

There is also a second source of extra overhead in the search process. If we 
fail to find a solution changing only n variables, we try it for some larger value 
n' > n. Clearly, when changing n! variables we must repeat all the steps of the 
search process for changing only n variables. This second source of overhead can, 
however, be neglected in comparison with the first source of overhead. 

Constraint propagation can be used to speed up the search process in con- 
structive solution methods by pruning the search space. We will investigate 
whether it can also be used to avoid solving an exponential number of CSPs. If we 
could use constraint propagation to determine variables that must be assigned 
a new value, we may avoid solving a substantial amount of CSPs. We could do 
this by determining the domain values of the variable that are allowed by the 
constraints, independent of the original assignment. If a variable is assigned a 
value that is no longer allowed, we know that it must be assigned a new value. 

In the same way we can also determine a better lower bound for the number 
of variables that must change their current value. Thereby we reduce the second 
source of overhead. The following example gives an illustration of how constraint 
propagation helps us to reduce both forms of overhead. 

Example Let u,v,w be three variables and let Du = {1,2}, Dy = {1,2} 
and Du) = {1,2} be the corresponding domains. Furthermore, let there be not 
equal constraints between the variables u and v and between v and w, and let 
a{u) = 1, a{v) = 2 and a{w) = 1 be an assignment satisfying the constraints. 

Now, because of some incident a unary constraint Cu) = {w ^ 1) is added. 
If we enforce arc-consistency on the domains Du^ Dy, Dy) using the constraints, 
we get the following reduced domains Su = {2}, Sy = {1} and Sw = {2}. From 
these reduced domains it immediately follows that all three variables must be 
assigned a new value. So there is no point in investigating whether we can find 
a solution by changing one or two variables. Furthermore, we do not have to 
consider which variables must change their values. 

After changing the values of the three variables, there might be constraints 
that are no longer satisfied. Now, using the new assignments made, we can try, 
using constraint propagation, to determine the other variables that must also be 
assigned a new value. The algorithm (called by the procedure ‘solve (V, D, C, a)’) 
presented in the next section implements the idea of combining local search (in 
‘find (F, i, U, X, F)’) and constraint propagation (in ‘assign_and_find {v, F, i, Y)). 
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4 Implementation 

Let y be a set of variables, D be an array of containing a domain for each 
variable, C be a set of constraints and a be an array containing an assignment. 
Then the procedure ‘solve {V, D, C, a)’ is used to find a nearby solution. 

In the algorithm, all arguments of the procedures are based on call by value. 
The procedure ‘constraint_propagation (F)’ applies constraint propagation on 
the future variables F given the new assignments made to the past variables and 
the current variable v. The function ‘cost {v, d, o)’ specifies the cost of changing 
the value o of the variable v to the value d. The minimum cost of changing the 
values of all the variables in U given their current domains D and current as- 
signments a is given by the function ‘set_cost ([/, D, a)’. The constant ‘max_cost’ 
denotes the maximum cost of changing the infringed solution. The function ‘con- 
fiict_var (C,a)’ determines the variables involved in a constraint violation. The 
variables m, u, C and D are global variables. Finally, the set of variables Y is 
used to represent the CSP variables for which new assignments have been tried 
without success. 



procedure solve {V, D, C, a) 
solved := false; 

X := conflict_var (C,a); 
constraint_propagation (F); 

U ;= {w G F I a[v] ^ 

X := XU F; 
m := set_cost {U,D,a); 
u ;= max_cost; 

while not solved and m < max_cost do 
find (F,O,[/,X,0); 
m := u; 
u ~ max_cost; 

end; 

end. 

procedure find [F, c, U, X, Y) 
if 17 7 ^ 0 then 
V := select_variable (17); 
assign_and jind {v, F — {w}, c, F); 
else 

while not solved and X 7 ^ 0 do 

V ■= select_variable (X); 

X :=X- W; 

assign_and jind {v, F — {w}, c, Y); 

Y := Fu{w}; 

end; 

end; 

end. 



procedure assign_and Jind (v, F, c, Y) 
save_domain (D[t)]); 
o := o[n]; 

D[v] := D[v] - {o}; 
while not solved and D\v] 7 ^ 0 do 
d := select_value (77[n]); 

D[v] := D[v] - {d}; 
a[ti] := d; 

c := c-l- cost (v, d, o); 
save_domains_of (F); 
constraint_propagation {F); 
if not an_empty .domain {F) then 
U := {n G F I a[v] ^ F[w]}; 

X := t/U conflict.var (C,a); 
if X = 0 then 
solved := true; 
output (a); 

else if c-l- set.cost (17, D,a) < m 
and 17 n F = 0 then 
find (F,c,l/,(FnX) -F,F); 
else if 1/ n F = 0 then 
u min(u, c-|-set_cost (17, D, a)); 
end; 
end; 

restore_domains_of (F); 
end; 
a[ti] := o; 

restore.domain (F[t)]); 

end. 
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conflict _var (C, a) : 



u 



(a[t)ij].....a(Di^])=false 




set_cost {U,D,a) := ™n{cost {v,d,a[v]) \ d G D[v]} 

5 Formal Results 

The following theorem guarantees that the above algorithm finds the most 
nearby solution if a solution exists. 

Theorem 1. ^ Let {V.,'D,C) be a CSP and let a : V ^ [JT) be an assignment 
for the variables. 

The algorithm will find a solution for the CSP by changing the current as- 
signment for the least number of variables in V. 

In the example of Section 3, we have illustrated that the algorithm reduces 
the search overhead by applying constraint propagation. We will now present a 
result that shows that all overhead can completely be eliminated in case unary 
constraints are added to a solved instance of a CSP. Since in many practical 
situations calamities such as the unavailability of resources or the lateness of 
supplies, can be described by unary constraints, this result is very important. 

Proposition 1. ^ Let (y,'D,C) be a CSP containing only unary and binary 
constraints and let the assignment a be a solution. 

If we create a new CSP by adding only unary constraints to the original 
constraints, then using node-consistency in the procedure ‘solve’ and forward 
checking in the procedure ‘assigmand-find’ avoid considering more than one sub- 
set of X. 

The above proposition implies that repairing a solution after adding unary 
constraints to a CSP will in the worst case have a complexity 0{T^) where T 
is the complexity of solving the CSP from scratch. Using constraint propaga- 
tion such as arc-consistency or better can bring the complexity close to 0{T). 
If, however, binary constraints are added, the situation changes. If the current 
solution does not satisfy the added binary constraint, one of the two variables 
of the constraint must be assigned a new value, and possibly both. Since we do 
not know which one, both possibilities will be investigated till a new solution 
is found. This implies that the repair time will double with respect to adding a 
unary constraint. In general, if T' is the average time needed to repair a solution 
after adding a unary constraint, then -T' is an upper bound for the average 
time needed to repair a solution after adding m binary constraints. 

6 Experimental Results 

The viability of the presented algorithm is best shown by comparing the repair- 
based approach (RB-AC) with the constructive approach (AC) by the number of 

^ Due to space limitations the proofs have been left out. 
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nodes visited and the number of variables changed. The algorithms used in the 
comparison both apply arc- consistency during constraint propagation. 

We have conducted two sets of test, both with randomly generated CSPs. In 
both tests, the cost of changing the assignment of one variable of the infringed 
solution is 1. Therefore, the total cost of a new solution is equal to the number 
of variables that are assigned a new value. To generate the test problems, we 
used the four-tuple (n, d,pi,p 2 )- Here, n denotes the number of variables of the 
generated CSPs, d the domain size of each variable, p\ the probability that there 
is a binary constraint between two variables and p 2 the conditional probability 
that a pair of values is allowed by a constraint given that there is a constraint 
between two variables. For several values of n and d, we have generated instances 
for values of p\ and p 2 between the values 0 and 1 with step size 0.1. So, we 
have looked at a 100 different combinations of values for pi and p 2 and for each 
combination 10 instances have been generated randomly. 

In the first set of tests, the repair-based algorithm had to find a solution for a 
CSP nearby a randomly-generated value assignment for the variables. This test 
presents the worst possible scenario for the repair-based algorithm since in most 
cases many constraints will be violated by the generated assignment. As we saw 
in the previous section, if T' is the average time needed to repair a solution after 
adding a unary constraint, then ■ T' is an upper bound for the average time 
needed to repair a solution after adding m binary constraints. For example, the 
results contained a CSP instance with 10 variables, each having a domain with 
10 values, and with p\ = 0.6 and p 2 = 0.7, that was solved without backtracking 
using backtrack search with arc-consistency. The repair-based algorithm with 
arc-consistency visited 40521 nodes to find a solution. 

In the second set of tests, we first solved the generated CSP. Subsequently, 
we added a unary constraint defined as follows. Choose randomly a variable, 
delete from the set of domain values half of the number of values, the current 
value (as in the solution) inclusive. We have conducted the second set of tests 
for the following values of n and d\ (10,10), (20,10) and (50,20)^ The figures 
below show some typical results. Note that the repair-based algorithm visits more 
nodes than the constructive algorithm. The overhead is caused by the iterative 
deepening part of the repair-based algorithm. 



7 Related Work 

Below we discuss three distinct types of related work. First, as stated above, Ver- 
faillie and Schiex [9] proposed a repair-based method for Dynamic CSPs. Their 
method starts with unassigning one of the variables of each violated constraint. 
Subsequently they generate a new solution for the unassigned variables using a 
constructive search method. The new solution for the variables that were unas- 
signed, might be inconsistent with the remaining variables. To handle this, for 

^ In this set of tests, many instances required more than our maximum number of 
10,000,000 nodes. 
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each of these violated constraints one of the remaining variables is unassigned 
when the constraint violation is encountered. 

At first sight, the solution method proposed in this paper might be viewed as 
an extension of Verfaillie and Schiex’s solution method. If only unary constraints 
are added, the set U used in the solution method proposed in this paper, cor- 
responds in some sense with the set of unassigned variables in Verfaillie and 
Schiex’s solution method. The fact that the variables in U are not unassigned 
but only change their current assignment, is not really significant. 

There are, however, two very important differences. (1) Verfaillie and Schiex 
apply constraint propagation (forward checking) on the set of imassigned vari- 
ables while we apply constraint propagation on all the variables. The latter 
enables an early detection of variables that must change their current assign- 
ments. (2) The set of variables that will be assigned a new value by Verfaillie 
and Schiex’s solution method, is rather arbitrary. It depends on choices of new 
values to be assigned to unassigned variables. There is no way to guarantee that 
this will result in a nearby solution. This lack of guarantee holds even stronger 
if non-binary constraints are added. 

Second, other approaches that have been proposed for DCSPs try to avoid 
repetition of work [1, 5, 7, 8]. They keep track of the reason for eliminating 
values from a variable’s domain, possibly using reason maintenance [.3], i.e., the 
search process remembers the position of the search process in the search tree 
where the previous solution was found. Furthermore, the search process can 
incorporate changes in the search tree caused by the addition or the removal 
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of constraints. This makes it possible to continue the search process in the new 
situation without repetition of work. 

Approaches for avoiding repetition of work can also be incorporated in the 
solution method proposed in this paper. Especially, arc-consistency for DCSPs 
proposed by Bessiere [1], and by Neveu and Berlandier [.5], can be useful for 
preprocessing the changed CSP. The solution reuse proposed by Verfaillie and 
Schiex [7, 8] does not avoid repetition of work when combined with the solution 
method proposed in this paper. The reason is that the approach proposed in this 
paper creates a search tree of possible repairs around an invalidated solution. 
Domain reductions caused by the value assignments to the variables described 
by the now invalid solution are irrelevant for this search process. 

Third, a quite different approach is one that tries to avoid that solutions 
become invalidated in a DCSP. Wallace and Freuder [10] propose an approach 
that consists of determining solutions that are relatively stable under successive 
changes of a CSP. This stability reduces the amount of repair needed. The un- 
derlying assumption of the approach is that changes have a certain structure. 
This is a reasonable assumption in practical problems. The approach can also 
be combined with the here proposed approach to reduce the amount of repair 
needed. 

Our approach can be viewed as a combination of local search with backtrack- 
ing search and constraint propagation normally found in constructive solution 
methods. Schaerf [6], and Zhang & Zhang [11] have also combined local search 
with a constructive method. Schaerf [6] starts with a backtrack-free construc- 
tive method that uses constraint propagation, until a dead-end is reached. After 
reaching a dead-end local search is applied on the partial solution till a partial 
solution is reached with which the constructive method can continue. This pro- 
cess is continued till a solution is found. Zhang and Zhang [11] do it just the 
other way around. They start with generating a valid partial solution for the 
first k variables. One of the approaches that they consider for this initial phase 
is hill climbing. Subsequently, they try to complete the partial solution using 
backtracking search on the remaining variables. 

What both these approaches have in common is that they combine construc- 
tive and local search technique and that they do not integrate them in one new 
approach. 

8 Conclusion 

In this paper we have presented a new algorithm for Dynamic CSPs. The merit 
of the new algorithm resides in the fact that it is capable to repair efficiently 
a solution by a minimal number of changes when the circumstances forces us 
to change the set of constraints. Experimental results point out that repairing 
a solution is much harder than creating a new solution from scratch if many 
non-unary constraints are violated by the solution that needs to be repaired. 
If, however, the original solution is infringed because of addition of a unary 
constraint, the repair process is not much harder than generating a new solution 
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from scratch. The latter case arises often in practical situations where machines 
break and goods are delivered too late. 

In our future research we intend to extend our algorithm in order to approx- 
imate nearby solutions when several k-ary constraints are added. In this way 
we hope to find near optimal solutions for instances that are infeasible at the 
moment. 
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Abstract. This paper proposes a simulated annealing-based approach 
for obtaining compact efficient classification systems from fuzzy data. 
Different methods for generating decision rules from fuzzy data share a 
problem in multidimensional spaces: their high cardinality. In order to 
solve it, the method of simulated annealing is proposed. This approach 
is illustrated with two well-known learning sets. 



1 Introduction 

Rule production systems are, with decision trees, one of the five formalisms used 
by Quinlan [12] to divide the field of machine learning. In the fuzzy framework, 
fuzzy-rule-based systems have been applied mainly in control [13], and, to a 
less extent, in classification. Apart from the obvious way of deriving fuzzy rules 
from human experts, some methods for automatically generating fuzzy rules 
from numerical data have been proposed [14,15,7]. Another way of generating 
rules from fuzzy data consists of going back to the meaning of the implication. 
From a set-theoretic point of view, the implication A ^ i? is true if, and only 
ii, \/x,x G A ^ X G B, that is, A C B. Fuzzy set theory has a wide class of 
measures of containment. So, measuring the containment between A and B, we 
can describe the strength between a set of fuzzy constraints A and a conclusion B. 

Independently of the method used to generate the rules, there is a problem 
common to all of them. The generated set of rules will usually have high cardi- 
nality. In order to select a compact and efficient subset of the rules, a method 
of stochastic search like simulated annealing can be used. This paper addresses 
this question. 

The paper is organized as follows: Section 2 introduces classification systems 
with fuzzy data, and we describe various approaches to the problem of rule gen- 
eration. Section 3 summarizes the simulated annealing technique and describes 
its application to construct compact efficient rulesets from generated rules. In 
Section 4, simulation results for two well-known problems of classification are 
shown to illustrate the proposed approach. Finally, Section 5 concludes this pa- 
per. 



S. A. Cerri and D. Dochev (Eds.): AIMSA 2000, LNAI 1904, pp. 283—291, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



284 



Francisco Botana 



2 Classification Systems 

Let X be the universe of objects. Each object is described by a collection of 
attributes A = {A \, . . . , An}. Each attribute Ai measures a feature of an object 
and has a discrete domain of values , . . . , Vi^. We assume that an object can 
have non-zero membership to different values of each attribute. The membership 
function of the objects in an attribute- value pair [Ai : Vi^] is thus a function 

[0,1] . (1) 

Each object belongs in some degree to every class of a set C = {ci, . . . , c;}. So, 
a known object of the universe, an example, is defined by a fixed-length vector 
such as 

{{[Ai : v^^],nu^),...,{[Ai : , . . . , {d, , ■ ■ ■ , {ci, Hc,)) ■ (2) 

This vector has a number of components given by the addition of the cardinalities 
of the attributes and the classes. 

2.1 Rule Generation 

A classification rule is an if-then statement whose antecedent is a conjunction 
of attribute-value pairs and the consequent is a class 

[An U7._], . . . , [Ag : Vs_] ^ Cp . (3) 

As stated in the Introduction, some methods for automatically generating 
rules have been reported. For the sake of simplicity, we will assume a two-input 
one-output system. In [15], the variables xi^X 2 ,y are discretized in possibly 
overlapping categories. Each input-output data pair 

{{xl,X2),y^) (4) 

produces a rule 

[Ai:C',j],[A2:C',i]^C,i , (5) 

where Cyi , C^i , C^i are the variable categories for which the membership of the 
values is the greatest. 

A partition of the space of objects by a simple fuzzy grid is used in [7] for 
generating the rules. Each element of the partition gives rise to a rule. The 
coordinates of the element are the rule antecedents. Its class can be defined 
euristically taking the most probable one in this element. 

Another way for generating rules from fuzzy data [2] goes back to the mean- 
ing of the implication. When dealing with crisp noise-free data an error-free 
rule such as pi , p 2 — ^ c can be inferred if the extension of p\ n p 2 is contained in 
that of c. A direct translation into the fuzzy case, using the containment defini- 
tion proposed by Zadeh, leads to reject rules where just a case does not fulfill 
the implication. Fuzzyfying the notion of set-containment [9,3] and discretizing 
the attribute space, a rule for each possible combination of antecedents can be 
formed. The rule class is the one which best contains the intersection of the fuzzy 
antecedents. 
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2.2 Classification of New Objects 

In order to classify an object from a set of rules, we proceed as follows: for 
each rule, we calculate the membership of the antecedent. The membership of 
the consequent will be set equal to that one. If there is more than one rule 
that predicts a class, take the highest membership. So, in general, a ruleset will 
classify an object in all classes with different membership. An object will be 
correctly classified if its known class with the highest membership equals the 
predicted class with the highest membership. 



3 The Method of Simulated Annealing 

3.1 Introduction of the Algorithm 

Simulated annealing is a kind of stochastic search that has its roots in metallurgy. 
Annealing is a technique used in the fabrication of objects made of metal or glass. 
When these objects are shaped, small regions of stress develop in response to 
deformations at the atomic level and cause the object to be prone to fracture. 
Annealing consists of heating the object till its atoms have sufficient energy 
to relax any stress, and then, cooling it slowly. If the cooling process is done 
properly, there will not be regions of stress. 

Simulated annealing [8,4] is a method to solve combinatorial optimization 
problems based in the annealing of solids. The energy of the system is replaced 
by a cost function, and the temperature by a control parameter. The simulation 
consists of mimicking the evolution of the system towards its thermodynamic 
equilibrium, for a given value of its control parameter. The equilibrium state 
with a small value of the parameter is taken as the solution. 

3.2 Application to the Classification Problem 

The problem of classification can be stated as follows: Given a set of rules R 
derived from a data universe U, find the subset G with minimum cardinality 
such that G is the best classification ruleset for the data U. 

This combinatorial optimization problem cannot be solved for optimality, due 
to the amounts of computation time. The set R can be very large and usually 
contains many local minima. However, a good local minimum found with the 
simulated annealing technique may be nearly as good as the global minimum. 
For a detailed discussion of the convergence of the algorithm see [10]. 

Let R = {ri : i = l,...,m}. A configuration conf of the system is any 
subset of R coded as a string of m bits, where a 1 in the *-th position denotes 
that the rule is present in the configuration, otherwise 0 appears. The cost 
of a configuration must solve the imprecision in the statement of the problem 
of classification assigning a weight to each of the mentioned factors, minimum 
cardinality (wcard) or best classification ruleset (wcias)' 

C{conf) = Wcias ■ ifrnisclassified_examples + Wcard ■ ^presentjrules . (6) 
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A transition of the system from a configuration conf to another newconf is 
implemented randomly choosing a bit position and complementing its value. 
The simulated annealing procedure is then stated in pseudocode as follows 



begin 

repeat 

repeat 

c <- initial value of the control parameter; 

conf <- random configuration; 

newconf <- transitionCconf ) ; 

gain <- C(conf) - C(newconf); 

if gain > 0 

then conf <- newconf ; 
else if expCgain / c) > random[0,l) 
then conf <- newconf ; 
until equilibrium for c is reached; 
c <- reduce (c); 

until global equilibrium is reached 

end 



In order to fully describe the practical implementation of the algorithm there 
just remains to assign values to the parameter c and check whether the conditions 
are met. This description is referred to as a cooling schedule. The proposed 
schedule is: 



~ initial value of the control parameter. 

With the aim that almost all transitions from any starting configuration will 
be accepted, we begin with a usually large value for c, and perform a number 
of transitions. If the ratio between accepted and proposed transitions is less 
than 0.8, double the value of c. When this ratio exceeds 0.8, take this value 
of c as the initial one. 

— decrement of the control parameter: reduce{c) = 0.9c 

— reaching the equilibrium for a value of c. 

For any value of c the system is in equilibrium if at least 50 transitions are 
accepted. 

— stop the algorithm. 

The algorithm will be halted after 25 decrements of the control parameter. 



4 Experimental Results 

4.1 Symbolic Data 

The classification problem proposed in [11] and reformulated in [17] will be 
used to test our approach with non-numeric data. The classification task con- 
sists of deciding which sport to play on a Saturday morning knowing a bit of 
information about the weather. There are three classes {swimming, volleyball, 
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weightJifting} and mornings are characterized by four attributes with values 
outlook={sunny, cloudy, rain\, temperature={cool, mild, hot}, humidity = {high, 
normal} and wind={true, false}. The fuzzy membership of each attribute-value 
pair and each class for 16 examples is shown in Table 1. 



Table 1. The learning set used in the experiment 



Ex. Attributes Class 

Outlook Temperature Humidity Wind 

Sunny Cloudy Rain Hot Mild Cool High Normal True False Voll Swim Wlif 



1 


0.9 


0.1 


0.0 


1.0 


0.0 


0.0 


0.8 


0.2 


0.4 


0.6 


0.0 


0.8 


0.2 


2 


0.8 


0.2 


0.0 


0.6 


0.4 


0.0 


0.0 


1.0 


0.0 


1.0 


1.0 


0.7 


0.0 


3 


0.0 


0.7 


0.3 


0.8 


0.2 


0.0 


0.1 


0.9 


0.2 


0.8 


0.3 


0.6 


0.1 


4 


0.2 


0.7 


0.1 


0.3 


0.7 


0.0 


0.2 


0.8 


0.3 


0.7 


0.9 


0.1 


0.0 


5 


0.0 


0.1 


0.9 


0.7 


0.3 


0.0 


0.5 


0.5 


0.5 


0.5 


0.0 


0.0 


1.0 


6 


0.0 


0.7 


0.3 


0.0 


0.3 


0.7 


0.7 


0.3 


0.4 


0.6 


0.2 


0.0 


0.8 


7 


0.0 


0.3 


0.7 


0.0 


0.0 


1.0 


0.0 


1.0 


0.1 


0.9 


0.0 


0.0 


1.0 


8 


0.0 


1.0 


0.0 


0.0 


0.2 


0.8 


0.2 


0.8 


0.0 


1.0 


0.7 


0.0 


0.3 


9 


1.0 


0.0 


0.0 


1.0 


0.0 


0.0 


0.6 


0.4 


0.7 


0.3 


0.2 


0.8 


0.0 


10 


0.9 


0.1 


0.0 


0.0 


0.3 


0.7 


0.0 


1.0 


0.9 


0.1 


0.0 


0.3 


0.7 


11 


0.7 


0.3 


0.0 


1.0 


0.0 


0.0 


1.0 


0.0 


0.2 


0.8 


0.4 


0.7 


0.0 


12 


0.2 


0.6 


0.2 


0.0 


1.0 


0.0 


0.3 


0.7 


0.3 


0.7 


0.7 


0.2 


0.1 


13 


0.9 


0.1 


0.0 


0.2 


0.8 


0.0 


0.1 


0.9 


1.0 


0.0 


0.0 


0.0 


1.0 


14 


0.0 


0.9 


0.1 


0.0 


0.9 


0.1 


0.1 


0.9 


0.7 


0.3 


0.0 


0.0 


1.0 


15 


0.0 


0.0 


1.0 


0.0 


0.0 


1.0 


1.0 


0.0 


0.8 


0.2 


0.0 


0.0 


1.0 


16 


1.0 


0.0 


0.0 


0.5 


0.5 


0.0 


0.0 


1.0 


0.0 


1.0 


0.8 


0.6 


0.0 


For each class 


there 


are 


(3 4 


-34 


2 4 2) 


4(3- 


343 


•24 


3 • 2 


43- 


24 : 


•2) 


+ (3 


• 3 • 2 - 


f 3 • 2 


• 2 - 


f 3 • 


3 • 2) 4 3 ■ 


3-2 


• 2 = 


131 possible rules. 



3-2-h 



a well-known fuzzy subsethood measure [9] to decide their classification, a set 
of 131 rules is obtained. The cost function is defined to favor the number of 
correctly classified examples versus the cardinality of the ruleset. So, taking into 
account the average number of rules in a random configuration and the size of 
the learning set, the selected weights are Wdas = 100, Wcard = 1- 

The results of five trials of the simulated annealing algorithm are shown in 
Table 2. The best result returned five rules with 13 terms that correctly classify 
all the learning examples: 



— temperature=mild, wind=false ^ volleyball. 

— outlook=cloudy, temperature=hot, wind=false ^ swimming. 

— outlook=sunny, temperature=hot, humidity=high, wind=true — *■ swimming. 

— temperature=cool, wind=true — > weightJifting. 

— humidity=normal, wind=true — > weightJifting. 



The classification results for this set of five rules are shown in Table 3. 
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Table 2. Simulation results for the Sports learning set 



Trial no. Rewriting accuracy No. of rules No. of terms 



1 


15/16 


6 


11 


2 


16/16 


6 


17 


3 


15/16 


5 


8 


4 


16/16 


5 


13 


5 


16/16 


8 


24 



These results outperform those obtained in [17] with a fuzzy IDS strategy 
and in [16] with an information gain based method for deriving rules. Both 
approaches report a rewriting accuracy of 13/16 with 6 rules, and 11 and 9 
terms, respectively. 



Table 3. The Sports learning set results (best trial) 



Ex. Classification known in training data Classification with learned rules 



Volleyball Swimming Weight Jifting Volleyball Swimming Weight Jilting 



1 


0.0 


0.8 


0.2 


0.0 


0.4 


0.2 


2 


1.0 


0.7 


0.0 


0.4 


0.2 


0.0 


3 


0.3 


0.6 


0.1 


0.2 


0.7 


0.2 


4 


0.9 


0.1 


0.0 


0.7 


0.3 


0.3 


5 


0.0 


0.0 


1.0 


0.3 


0.1 


0.5 


6 


0.2 


0.0 


0.8 


0.3 


0.0 


0.4 


7 


0.0 


0.0 


1.0 


0.0 


0.0 


0.1 


8 


0.7 


0.0 


0.3 


0.2 


0.0 


0.0 


9 


0.2 


0.8 


0.0 


0.0 


0.6 


0.4 


10 


0.0 


0.3 


0.7 


0.1 


0.0 


0.9 


11 


0.4 


0.7 


0.0 


0.0 


0.3 


0.0 


12 


0.7 


0.2 


0.1 


0.7 


0.0 


0.3 


13 


0.0 


0.0 


1.0 


0.0 


0.1 


0.9 


14 


0.0 


0.0 


1.0 


0.3 


0.0 


0.7 


15 


0.0 


0.0 


1.0 


0.0 


0.0 


0.8 


16 


0.8 


0.6 


0.0 


0.5 


0.0 


0.0 



4.2 Iris Data 

The Iris database [1] is a well-known test bench in the pattern recognition and 
machine learning communities. The data set contains 3 crisp classes of 50 exam- 
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pies each, where each class refers to a type of iris plant (Setosa, Versicolor and 
Virginica). There are four attributes (the length and width of plant’s petal and 
sepal in centimeters). In Figure 1 the domains for the problem are given. Each 
attribute has 3 linguistic values: short/narrow, medium and long/wide. 





Sepal Length Sepal Width 





Petal Length Petal Width 

Fig. 1. Domains for iris attributes 



The results of the simulated annealing technique for the Iris dataset are 
shown in Table 3. The number of rules is 155 and Wdas = 100, Wcard = 1. In all 
trials the obtained rulesets misclassify 4 examples. The best results, regarding 
the number of terms, were returned in two trials: 

— petal width=narrow — > Setosa. 

— petal length=medium, petal width=medium ^ Versicolor. 

— petal length=long — > Virginica. 

— sepal length=medium, petal width=wide — > Virginica. 

— petal length=short — s- Setosa. 

— petal length=medium, petal width=medium — > Versicolor. 

— petal width=wide — *■ Virginica. 

— sepal length=medium, petal length=long — > Virginica. 

The results of other algorithms on the Iris dataset are shown in Table 4 for 
comparison. 
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Table 4. Simulation results for the Iris dataset 



Trial no. Rewriting accuracy No. of rules No. of terms 



1 146/150 

2 146/150 

3 146/150 

4 146/150 

5 146/150 



4 

8 

4 

4 

6 



6 

17 

6 

7 

14 



Table 5. Accuracies of other algorithms on the Iris dataset 



Algorithm Accuracy 



GA approach [7] 149/150 
FIL [16] 144/150 

GVS [6] 142/150 

IVSM [.5] 141/150 



5 Conclusion 

In this paper, we proposed a simulated-annealing-based approach to the con- 
struction of compact fuzzy classification systems with if-then rules. This ap- 
proach is conceptually and computationally simple. Experimental results show 
that this method produce efficient rulesets, with an accuracy better than or 
similar to other learning systems. 



References 

1. Blake, G., Keogh, E., Merz, C. J. UCI Repository of machine learning databases 
[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of 
Galifornia 288 

2. Botana, F.: Learning rules from fuzzy datasets. Proc. 5th Europ. Congress on Intel. 
Techniques and Soft Comp., Aachen, Germany (1997) 1109-1113 284 

3. Botana, F.: Deriving fuzzy subsethood measures from violations of the implication 
between elements. Lee. Notes Artif. Intel. 1415 (1998) 234-243 284 

4. Cerny, V.: Thermodynamical Approach to the Traveling Salesman Problem: An 
Efficient Simulation Algorithm. J. Optim. Theory Appl. 45 (1985) 41-51 285 

5. Hirsh, H.: Generalizing version spaces. Mach. Learning 17 (1994) 5-46 290 

6. Hong, T. P., Tseng, S. S.: A generalized version space learning algorithm for noisy 
and uncertain data. IEEE Trans. Knowledge Data Eng 9 (1997) 336-340 290 



Construction of Efficient Rulesets 



291 



7. Ishibuchi, H., Nozaki, K., Yamamoto, N., Tanaka, H.: Construction of fuzzy classi- 
fication systems with rectangular fuzzy rules using genetic algorithms. Fuzzy Sets 
Syst. 65 (1994) 237-253 283, 284, 290 

8. Kirpatrick,S., Gelatt, C. D., Vecchi, M. P.: Optimization by Simulated Annealing. 
Science 220 (1983) 671-680 285 

9. Kosko, B.: Neural networks and fuzzy systems. Prentice Hall, Englewood Cliffs 
(1992) 284, 287 

10. Laarhoven, P., Aarts, E.: Simulated Annealing: Theory and Applications. Reidel, 
Dordrecht (1987) 285 

11. Quinlan, J. R.: Induction of decision trees. Mach. Learning 1(1) (1986) 81-106 286 

12. Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San 
Mateo (1993) 283 

13. Sugeno, M.: An Introductory Survey of Fuzzy Control. Inf. Sci. 36 (1985) 59-83 
283 

14. Takagi, T., Sugeno, M.: Fuzzy Identification of Systems and its Applications to 
Modeling and Control. IEEE Trans. Systems Man Cybernet. 15 (1985) 116-132 
283 

15. Wang, L. X., Mendel, J. M.: Generating fuzzy rules by learning from examples. 
IEEE Trans. Systems Man Gybernet. 22 6 (1992) 1414-1427 283, 284 

16. Wang, G. H., Liu, J. F. et ah: A fuzzy inductive learning strategy for modular 
rules. Fuzzy Sets Syst. 103 (1999) 91-105 288, 290 

17. Yuan, Y., Shaw, M. J.: Induction of fuzzy decision trees. Fuzzy Sets Syst. 69 (1995) 
125-139 286, 288 



Fuzzy-Neural Models for Real-Time 
Identification and Control of a Mechanical 

System 



leroham S. Baruch^, J. Martin Flores^, J. Carlos Martinez^, and Boyka 

Nenkova^ 

1 CINVESTAV-IPN 

Av.IPN No 2508, A.P. 14470 Mexico D.F., C.P. 07360 Mexico. 

2 IIT-BAS, Sofia, Bulgaria 
baruchSctrl . c invest av.mx 



Abstract. A two-layer Recurrent Neural Network Model (RNNM) and 
an improved Backpropagation-through-time method of its learning are 
described. For a complex nonlinear plants identification, a fuzzy-neural 
multi-model, is proposed. The proposed fuzzy-neural model, containing 
two RNNMs is applied for real-time identification of nonlinear mechani- 
cal system. The simulation and experimental results confirm the RNNM 
applicability. 



1 Introduction 

Recent developments in science and technology provide a wide scope of applica- 
tions of high performance electric motor drives in various industrial processes. 
In high-performance motor drive applications involving mechatronics, such as 
robotics, rolling mills, machine tools, etc., an accurate speed or position control 
is of critical importance and there DC-motors are still widely used to accomplish 
this task. There is an increasing number of applications in high precision mo- 
tion control systems in manufacturing, i.e., ultra precision machining, assembly 
of small components and micro drives It is very difficult to assure high posi- 
tioning accuracy due to many factors affecting the precision of motion, such as 
friction, backlash and stiffness in the drive system, [13,11]. Friction is a nat- 
ural resistance to relative motion between two contacting bodies. The friction 
model has been widely studied by numerous researchers. Extensive work can 
be found in [1,4,7] and [8]. It is commonly modelled as a linear combination 
of Coulomb friction, stiction, viscous friction, and Stribeck effect. The presence 
of nonlinear friction forces is unavoidable in high performance motion control 
system. In servo systems, if the controller is designed without consideration of 
the friction, the closed-loop system may show steady-state tracking error and/or 
oscillations. In addition, the friction characteristics may change easily due to the 
environment’s changes like load variations, temperature and humidity changes, 
and some dynamic effects could be observed . So the standard PID type servo 
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control algorithm is irot capable of delivering the desire precision under the in- 
fluence of frictioir. The frictioir compensatioir is not air easy problem as the 
friction function is highly nonlinear non-monotonic function. Some early works 
proposed to use an additive high frequency signal at near to zero velocity control 
for nonlinear friction effects linearization. Other researchers proposed to com- 
pensate the friction effects by means of adaptive schemes. In [9] , adaptive control 
was explored for the positioning of table with friction. A nonlinear compensation 
technique which has a nonlinear proportional feedback control force for the regu- 
lation of one degree of freedom mechanical system was proposed by [8]. Adaptive 
friction compensation for DC-motor drives and robot control systems are given 
by [13] and [lij. Some advanced works also are done on Neural Networks (NN) 
application for adaptive friction compensation. The cited in [7], works applied 
CAMAC based NN for robust control of systems with friction. [7], applied a 
reinforcemeirt adaptive learniirg, based on Radial Basis Function (RBF) NN 
for friction compensation of high speed precise mechairical system. [13] and [11] 
applied a Feedforward NNs for identiflcatioir and control of DC-motor drives. 
As it can be seen, the proposed schemes in the literature of NN learning con- 
trol systems possesses higher complexity and higher dimensionality, which makes 
them hardly applicable. To avoid this complexity, it is appropriate to use the 
multi-model NN approach, as it is done for NN identification of hysteresis and 
backlash model, [5] and [12]. Some works iir this held, allowing to identify com- 
plex ironliirear dynamic objects by means of multi-model ireural network, has 
beeir done by [2] aird [3]. So, the purpose of this paper is to apply the fuzzy ireu- 
ral multi-model approach, [2,3], to identify in real-time an unknown mechanical 
system with friction. 

2 Recurrent Neural Model. Fuzzy-Neural Multi-Model. 
Control Algorithm 

In [2] and [3] a discrete-time model of Recurrent Trainable Neural Network 
(RTNN), and the dynamic Backpropagation (BP) weight updating rule, are 
given. The RTNN model is described by the following equations: 

X{k+l) = JX{k) + BU{k) 

Z{k) = S[X{k)] (1) 

Y{k) = S[CZ{k)] 

where a:(-) is a n - state vector of the system; it(-) is a m-input vector; y(-) is 
a 1- output vector; z(-) is an auxiliary vector variable with dimension I , S'(-) 
is a vector-valued sigmoid function with appropriate dimension; J is a weight- 
state block-diagonal matrix with (1 x 1) and (2 x 2) blocks; B and C are weight 
input and output matrices with appropriate dimensions and block structure, 
corresponding to the block structure of J. As it can be seen from equations (1), 
the given RTNN model is a two-layer hybrid one, with one feedforward output 
layer and oire recurrent hidden layer. It is also completely parallel parametric 
oire, so it is useful for ideirtiflcation aird coirtrol purposes. This RTNN model 
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is nonlinear in large and linear in small, so the linearization of the sigmoid 
functions allows to study its dynamic properties, such as stability, observability 
and controllability. To preserve the RTNN stability during the learning, the 
model poles (the diagonal elements of the matrix J) must be restricted to remain 
inside the unit circle. The general BP learning algorithm is given in the form: 

W,j{k + 1) = W,j{k) + 7]AW,j (k) + ar]AW,j (fc - 1) (2) 

where Wij(C,J,B) is the ij-th weight element of each weight matrix (given in 
parenthesis) of the RTNN model to be updated; AWij is the weight correction 
of Wif, rj, a are learning rate parameters. The updates ACij , AJij, A By of 



model weights Cy , Jij, Bij are given by: 

Z\Cy(fc) = [T,(fc) - Y,{k)]Y,{k)[l - Y,ik)]Z,{k) (3) 

AJ,j{k) = RX,{k - 1) (4) 

R = Q{k)[T{k) - Y{k)]Z,ik)[l - Z,{k)] (5) 

AB,j{k) = RU,{k) ( 6 ) 



where T is a target vector with dimension I and \T — Y] is an output error vector, 
also with the same dimension; R is an auxiliary variable. 

For some nonlinear plant models with smooth nonlinearities, the given BP 
learning algorithm has demonstrated a fast convergence and a small mean square 
error in the final epoch of learning [2] . But in the case of nonlinear plants with 
non invertible or asymmetric nonlinearities, the learning was going bad, with 
large mean square error or it was impossible to perform. In these cases, some 
other approximation technics, like the fuzzy linguistic or relational models, could 
be applied. The classical fuzzy rule based approximation technics suffer of the 
disadvantage that it needs to apply defuzzification to obtain the fuzzy model 
output. The Takagi-Sugeno model is mix linguistic and mathematical regression 
model which does not need defuzzification because the consequent rule are crisp 
mathematical functions of the model inputs. The function used, in the conse- 
quent part of the rule, could be static or dynamic (state-space) model, which 
validation is determined by the membership function. To extend the validation 
limits of the membership functions which depends on the approximation error, 
the authors of [2] and [3], proposed as a consequent crisp function to use a 
RTNN function model, so to form a fuzzy-neural multi-model. The fuzzy rule 
of this model is given by the following statement: 

R, : IFxisTHEN ?/,(fc-kl) = Ni[cc(fc),M(fc), i = l,2,...,p] (7) 

where: Ni{-) denotes the RTNN function, given by eq. (1); i is the model 
number; p is the total number of RTNN models, corresponding to fuzzy rules Ri. 

The aim of this paper is to propose an adequate control law, corresponding to 
this fuzzy-neural multi-RTNN model and to apply it for a real-time identifica- 
tion and control of nonlinear mechanical system with friction. It is expected that 
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the application of a learning adaptive model like the fuzzy-neural miilti-RTNN 
model will be well suited for identification and control of such nonlinear pro- 
cess with unknown variable parameters, asymmetric nonlinearity and unknown 
dynamic effects. First a trajectory tracking control law, corresponding to the 
linearized RTNN model could be defined. 

Following [10] and [6], the next adaptive tracking control law could be writ- 
ten: 



U{k) = [CB\-^{CJX{k) + V^{k + 1) + -i[Y<^{k) - Y{k)]} (8) 

where 7 is a constant with values between -0.999 and 0.999; C, B are J are 
real matrices of dimensions corresponding to the RTNN model, given by (1) 
and Y‘^{k) is the reference signal. If the RTNN is controllable and observable, 
then the matrix product of the RTNN parameters (1) must be CB ^ 0. In 
the multi- model case, the corresponding fuzzy coordination rule is given by the 
statement: 

Ri : If X is Then Ui(fc) = Fi[X(fe), y(fc)], z = l,2,...,p (9) 

where Ui{k) = Fi[X{k), Y‘^{k), Y{k)] )], denotes the corresponding control func- 
tion (8), i is the model number; p is the total number of RTNN models, corre- 
sponding to fuzzy rules Ri, given by (7). 

The following part of the paper gives a brief description of this process and 
shows its general equations and parameters. 

3 Friction Model 

The general equation of an 1-DOF mass system with friction is given in the 
form, [1,4,7] and [8]: 



mq{t) + fr{q, t) + d{t) = kou{t) 



( 10 ) 



where m is the mass, q{t) is the relative displacement, v{t) = q{t) is the velocity, 
fr{v,t) is the friction force, u{t) is the control force, fco is the system gain, and 
d(t) is a bounded external disturbances due to measurement noises or other load 
forces. It is assumed that the external disturbance is bounded by an unknown 
upper bound d : 

|(i(t)| < d] t > 0 (11) 

The stick-slip friction force fr{v,t) is assumed to be modeled as follows, [7,8]: 



fr{v, t) = Fsiip{v)a{v) -I- Fstick{u)[l - a(u)] (12) 




1, v{t) > a 
0, v{t) < a 



a > 0 



(13) 



The sticking friction provides the value of the friction forces at zero velocity. The 
term is used to describe whether the mass will stick or break free from the static 
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friction forces. The positive and negative limits on the static friction forces are 
given by and F~ , respectively. Generally, they are not equal in magnitude, 
and the model should consider these asymmetry. The sticking friction is modeled 
as follows: 

f /s+, u{k) > F+ 

Fstick = < u{k), F- < u{k) < F+ (14) 

[ 0, u{k) < F- 

The mass cannot move until the applied force is greater in magnitude than 
the respective static friction force. The slipping function FsHp{v) provides values 
of the friction at non-zero velocity and is given by: 



Tsiip(?;) = F+biy) + F^ b{-v) 


(15) 


O O 

A VI 

o' 

II 


(16) 


F+{v) = F+ - AF+[1 - -h f3+v] 


(17) 


F~{v) = F~ - AF-[1 - -h /3"u] 


(18) 



where Z\F+ and AF~ are the respective drops from the static to the kinetic 
force level; and are the critical Stribeck velocities, and and (3~ are the 
viscous friction coefficients. The friction force is modeled as a summation of the 
Coulomb friction, viscous friction, and the Stribeck effect. The Stribeck effect 
models the fact that friction force is decreasing with increasing fluid lubrication. 
Some models, [1], consider dynamic lag effect of the friction force with respect 
to the velocity, which effect can be neglected. For sake of simplicity, some slip 
friction parameters for both velocity directions could be considered as equal 
(e.g. v+=v~ =Vcr] /3+=/3"=/3). 

4 Simulation Results 

Let us consider a DC-motor - driven nonlinear mechanical system, to have the 
following friction parameters: a = O.OOlm/s; F^ = 4.2iV ; F^ = —4.0N; AF^ = 
l.SiV ; AF~ = —1.7N ; Vcr = 0.1 m/s; j3 = 0.5 Ns/m. Let us also consider 
that position and velocity measurements are taken with period of discretization 
To = 0.1 s, the system gain fco = 8 , the mass m = 1kg, and the load disturbance 
depends on the position and the velocity (d{t) = diq{t) + d 2 v{t); di = 0.25; ^2 = 
—0.7). So the discrete-time model of the 1-DOF mass mechanical system with 
friction, is obtained in the form: 

xi{k + 1 ) = X 2 {k) 

X 2 {k -|- 1) = — 0.025xi(fc) — 0.3cc2(fc) -I- 0.8u{k) — 0.1 fr{k) 

v{k) = X 2 {k) — x\{k) 
y{k) = O.lxiik) 



(19) 

( 20 ) 
( 21 ) 
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Fig. 1. On line system identification. Continuous line, positive RTNN-1 output; 
dashed line, negative RTNN-2 output. Parameters of learning j3 = 0.01, a = 



0.01 



where: Xi{k), X 2 {k) are system states; v{k) is system velocity and y{k) is system 
position; fc is a discrete time variable and the friction force frik) is governed by 
the equations (12) to (18) with given values of friction parameters. 

Simulation results of on-line identification and control of mechanical system 
with friction, are given on Fig. 1, 2, 3, 4 and 5. The Fig. 1 shows the 10-th 
seconds result of on-line system identification by means of two RTNN models 
(positive and negative). Both RTNN have the same architecture with one input, 
one output and two hidden nodes. The parameters of learning used are j3 = 0.01, 
a = 0.01. The state and parameter information, obtained by RTNN 1, 2 is used 
for system control. The reference input signal is a pulse train with frequency 0.5 
and amplitude 7. The Fig. 2 shows the reference signal tracking by the Fuzzy- 
neural system, applying the control law (8) and the fuzzy rule (9). The Fig. 3 
shows the control signal. The instantaneous and the MSE tracking errors are 
given on Fig. 4 and Fig. 5 respectively. The on-line simulation results show that 
an overshoot of the control due to improper identification occur in the beginning, 
but after few seconds this error decrease in normal values. 

5 Conclusions 

A comparative study of various mechanical systems with friction compensation, 
is done. The paper propose to use two RTNN models for mechanical system 
identification and control, and gives its configuration as a fuzzy-neural indirect 
adaptive control structure. The proposed recurrent fuzzy-neural multi-model 
approximates the complete nonlinear system dynamics including all nonlinear 
static and dynamic friction effects. As this model is dynamic in nature, so it 
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Fig. 2. Reference signal tracking. Continuous line, system output; dashed line, 
reference signal. The reference signal is a pulse train with frequency 0.5 and 
amplitude 7.0 




Fig. 3. Control signal, generated by the fuzzy-neural system. The parameter 
7 = 0.9 
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Fig. 4. Instantaneous error 




Fig. 5. MSE of reference signal tracking 
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is well suited for unknown dynamic effects (as elasticity) approximation too. 
The RTNN model have a Jordan canonical system structure which permits to 
use its parameters and states directly for feedforward/feedback control systems 
design. A dynamic Backpropagation-type learning algorithm of RTNN model 
training is also described. The simulation given results of nonlinear mechanical 
system identification and control by means of two RTNN models and two fuzzy 
identification and control rules show a good convergence and confirm RTNN 
multi-model applicability. 
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Abstract. Document clustering based on semantics is a fundamental 
method of helping users to seareh and browse in large cllections of 
documents. Recently a number of papers have reported the applications 
of self-organizing artificial neural networks in document clustering 
based on semantics. In particular Growing Neural Gas is a growing 
neural network that allows the user to reproduce the topological 
distribution of the inputs, but the structure obtained often has the same 
complexity as the input data structure; if the input space has more than 
three dimensions it is impossible to visualize or represent the GNG 
network as well as the input data structure. In this paper the authors 
propose a LEG modified network, called LBG-m, that can simplify the 
GNG structure in order to visualize and summarize it. The two 
algorithms constitute a tool for browsing large document sets and 
generating a set of semantic links between clusters of similar 
documents. 



1 Introduction 

Direct browsing into a document set is the usual way of searching for information if 
the topic is not familiar and it is not possible to formulate a satisfactory query. 
Browsing into a document set of hundreds or thousands of documents is certainly an 
imposing task but it can be made easier if the user is guided within the document 
space. To do this the document space must be ordered in some way; document 
clustering or taxonomies are the easiest and the most direct way of doing this. 
Artificial intelligence can provide useful and effective algorithms for organizing or 
arranging data into elusters as the artificial neural network (ANN) models. These 
models are inspired by our present understanding of the biological nervous system 
and are made up of a dense interconneetion of simple non-linear computational 
elements corresponding to the biological neurons. Each connection is characterized by 
a variable weight that is adjusted, together with other parameters of the net, during the 
so-called "learning stage". 

The self organizing networks, and in particular the Self Organizing Feature Map 
(SOM) [3], are ANNs that try to build a representation of some feature of the input 
vector used as “learning input set” during the learning stage. Recently this network 
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has been used to elassify information and doeuments in “doeument maps”. These are 
two-dimensional graphieal representations in whieh all the doeuments in a doeument 
set are depieted. The doeuments are grouped into elusters whieh all eoneem the same 
topie, and elusters related to similar topies are near eaeh other on the map. 

The Growing Neural Gas [7] is a self-organizing neural network that has no pre- 
defined lattiee, like the two-dimensional SOM grid, but is able to generate a set of 
intereonneeted elusters in the input spaee. The GNG network is eapable of following 
the input distribution even when it is eomp heated or has different dimensions [5]. The 
topology of the GNG strueture refieets the properties and eomplexity of the input 
veetor distribution so that the visualization is only possible for two or three 
dimensional input spaees [1], but the absenee of topologieal eonstraints makes it 
attraetive and partieularly suitable for the reproduetion of the original strueture of a 
hypertext. A GNG network was used in [9] to organize the nodes of a hypertext and 
the original strueture of the hypertext was eompared to the strueture generated by the 
neural network. The GNG network generates 2356 links between doeuments of whieh 
146 are in the original hypertext, 40.1 % of the total number of links, but the strueture 
generated is diffieult to manage. In faet the absenee of a simple geometrie strueture 
makes it diffieult to have an overall idea of the set of doeuments, something whieh it 
is possible to have in a SOM map (a two dimensional map whieh is easy to visualize, 
understand and remember). 

Consequently on one hand we have the SOM network that is a good visualization 
tool [10], [11]; but on the other by using the GNG network it is possible to organize 
the doeuments in an hypertext fashion, even though the two dimensional visualization 
is lost. A possible solution to this problem is to use another neural network to ereate 
an overall representation of the GNG network if this network is trained in high 
dimensional input spaee. The network used is a sort of modified LBG network [2], 
ealled LBG-m network, that is trained by using the GNG network as input. After the 
learning stage of the LBG-m network, the links between the neurons are taken into 
aeeount and a link set is ereated between the LBG-m units in order to reproduee the 
eonfiguration of the input GNG network. The two networks, the LBG-m and the GNG 
eonstitute a tool for navigating large doeument sets: the GNG network is used to 
obtain the doeument elusters eonneeted by a links strueture and the LBG-m algorithm 
is used to obtain an upper level strueture that gives an idea of the strueture of the 
eluster distribution in the input spaee. 



2 The LBG Algorithm 

The LBG algorithm allows the user to build a set of eode veetors by moving them 
to the eenter of their Voronoi sets. The algorithm eonverges through a finite number 
of adaptation steps in a loeal minimum of the distortion error funetion. The initial 
values of the network units determine the loeal minimum, so that different initial 
values ean give very different results. Another drawbaek of the algorithm is the 
presenee of "dead units": units that have an empty Voronoi set (i.e. without input 
veetors); this problem ean be avoided using a subset of the input veetors as initial 
values. 
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Assuming that the LBG network has a set of Nl units : 

L — \ly , I 

and each of them has a reference vector Wu 

W,, =\w„w^,...,w^] 

the learning algorithm of the LBG network follows: 

1 . Initialize the set of code vectors Wi using a subset of the input data D 

D jr/j , r/j V j’ 

2. For each unit i of LBG find its Voronoi set 



V^ =\d & D \ p - wj < 

Move each unit to the mean of its Voronoi set 



\d - w. 



V/,G L 










4. If during step 3 a unit changes place then go back to step 2, otherwise go to 
step 5. 

5. Return the current set of vector LBG 

Recently the LBG-U algorithm has been proposed [6]: this algorithm can reduce 
the number of codebook vectors and determine a better approximation of the input 
data distribution. 



3 The Proposed LBG-m Algorithm 

The aim of the LBG-m algorithm is to build a new structure that can give an overall 
picture of the distribution and of the “shape” of a trained GNG network, using the 
smallest possible number of neurons. Moreover the LBG-m can link the units in such 
a way as to "reproduce" the "shape" of the GNG network in the input space. The 
LBG-m is based on the LBG algorithm; it has to be trained using a GNG network as 
input. During the learning stage of the LBG-m algorithm the GNG network is divided 
into parts, one for each Voronoi set of the LBG-m units, and a new LBG-m unit is 
added for each unit that has an unconnected GNG graph inside its Voronoi region. 
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In fig. 1 the LBG-m units A and B are shown, the gray areas represent the input 
distribution; the LBG-m unit A has an unconnected GNG graph in its Voronoi region, 
so that the new LBG-m unit will be added in the area A. 




Fig. 1. An example of the unit adding method 

The LBG-m learning stage is repeated each time a new unit is added or the 
maximum number of units is reached. At the end the LBG-m units are linked together 
if the corresponding GNG subgraphs are linked together. 

Assume that the Growing Neural Gas network consists of a set of Nq units 

^ ~ i^l 5 S Nq 1 

where each unit has a reference vector 

and assume there is a set of connections C between the GNG units: 

CcGxG 

If Nlmax is the maximum number of LBG-m units allowed, the LBG-m algorithm 
follows: 

1 . Initialize an LBG network with two units 

2. While end = = false 

2.1. Update the position of the LBG-m units according to the LBG algorithm, 
and use the Wq set as input (the reference vectors of the GNG network); 

2.2. For each unit /, of the LBG-m network 

2.2.1. Take the set of GNG units in the Voronoi region corresponding to /, 
and the connection set C;, 

=jg,6G|||/,-g,||<||/,-g,||Vg;,6G} 

Cl, = {(gi,g2)e A\gi,gj G Vi] 
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2.2.2. If the graph defined by C;, is not conneeted and Nl < Nlmax then add a 
new unit to the LBG-m network. 

2.3 . If no neuron was added then end = true 

3 . Update the position of LBG-m units. 

4. Create the link between the LBG-m neural units that have, in their Vu regions, 
connections that extend beyond their Voronoi regions. 




Fig. 2a. The "cactus" input distribution and the GNG approximation 



Adding a new unit to the LBG-m network modifies the error function, so different 
strategies for adding a new unit to the LBG-m structure have been tried; obviously 
this can affect the topology of the network obtained. Some of the criteria used to 
initialize a new unit are the following: 

- random values; this criteria gives some dead units; 

- using the centroid of one of the section of the graph; this gives poor results; 

- random values inside the hypercube obtained by using the max component of 
the vectors in the Voronoi region; this criteria has given the best results 

To depict the LBG-m algorithm the classical "cactus" image was chosen. In fig. 2a 
it is possible to see the GNG approximation of the input distribution, the network 
obtained using the LBG-m algorithm is shown in fig.2b. This picture also shows the 
section of the GNG graph that belongs to each LBG-m unit. 
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In fig. 3 the GNG approximation of another input distribution is shown with the 
resulting LBG-m network in bold print. In this picture it is also possible to see that the 
network seems to approximate the GNG network using an error minimization criteria 
(not the constant entropy criteria). In fact in the rectangle to the right, where the input 
vectors are denser, there are many GNG units but only one LBG-m unit, so that no 
information is provided on the distribution of the inputs. The reason for this is that a 
new unit is added by using topology considerations. 




Fig. 3. The LBG-m approximation of another GNG structure 



It has to be said that the LBG-m gives the worst results if the input distribution is 
not very complex or widely scattered in the input space, as shown in fig. 4. In this 
case only two LBG units, A and B, are obtained because each of them has a 
connected graph in its Voronoi region and the algorithm does not add any new unit. 
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Finally the LBG-m algorithm is compared to another GNG network which is 
trained using the results of the first one: the results obtained show that the LBG-m 
algorithm works better. This is not surprising: the GNG network works better if the 
number of neurons is high enough. In fig. 5 the results obtained by training a 10 unit 
GNG is shown, as it is possible to see the GNG network requires more units to 
approximate the input "shape". 




Fig. 5. The GNG approximation of “cactus” distribution using 10 units 



4 An Application of LBG-m Algorithm for Document 
Classification 

The proposed algorithm is applied to a set of inputs from a high dimensional space: 
the vectors that represent a set of documents. Assuming a dictionary vector D, each 
document can be represented as a vector V where the element v, is the weight of the 
word di for that document. The word weight can be calculated using the Term 



308 R. Rizzo and E. G. Munna 



Frequency * Inverse Document Frequency (TFIDF) scheme which calculates the 
interest value of the word [12]. The document collection used to test the LBG-m 
algorithm is an HTML hypertext course on hypermedia and hypertext (available on 
the Internet at http:// wwwis.win.tue.nl/2L670/course.zip). This hypertext is made up 
of 162 nodes and 357 links. The vocabulary consists of 6568 words; however only 
536 words are used to build the document representation vector for each node; these 
words have been obtained by neglecting stopwords (like articles) and rare words. The 
training set is made up of 162 vectors of 536 real components obtained by using the 
TFIDF transformation. 

Using the GNG a network composed of 43 linked clusters has been obtained, with 
a mean of 4 documents for each unit. This structure is in a space of 536 dimensions so 
it is difficult to inspect or to visualize. 

Using the LBG-m network it is possible to obtain the graph in fig. 6. 




Fig-6 : The LBG-m structure obtained form the hypertext data 

Near each LBG-m unit in fig. 6 is shown the number of GNG units that are in the 
Voronoi region. It can be noted that there is a big cluster of documents near the LBG- 
m unit 2, and 4 small clusters of documents: the documents near the LBG-m unit 1 are 
about the hypertext model called tower, the documents near unit 3 are about the 
browsing strategies of hypertexts. 

The structure built by the LBG-m is visualized in an HTML table, and can be used 
to browse the set of documents; users can handle the 5 units easily, and they may also 
find the labels assigned to each LBG-m unit to be helpful. Users can access the GNG 
network when they want to look at the information structure in more details or when 
they want to read a document. 



5 Conclusions 

Browsing into a document space of hundreds of documents is really a imposing task, 
but it can be made easier if a hypertext-like structure is created in the document space. 
The GNG network can create this structure by sorting the documents into clusters and 
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linking these clusters together. Generally speaking the GNG algorithm approximate 
an input distribution in a flexible way, by adding new neurons when it is necessary to 
reduce the approximation error, but, if the input space has more than 3 dimensions, 
the visualization of the GNG network structure is very difficult. Neither is it possible 
to build another GNG structure using the first one as input, because the GNG network 
works well only if the number of neurons is quite high. The proposed LBG-m 
algorithm can approximate the structure of a GNG network using a low number of 
units and can reproduce the "shape" of the GNG structure. The results obtained also 
show that the network tries to minimize the quantization error so a small number of 
units is also used to approximate a dense input region. 



References 

1. Fritzke B., "Growing Self-Organizing Networks - Why?", ESANN'96: European 
Symposium on Artificial Neural Networks, Brussels 1995, p. 61-72 

2. Linde Y., Buzo A., Gray R. M., "An Algorithm for Vector Quantizer Design", 
IEEE Transactions on Communication, COM-28:84-95, 1980. 

3. Kohonen T., "Self Organizing Maps", Springer Verlag 

5. Tesauro G., Touretzky D. S., keen T. K. (eds.) "A growing Neural Gas Network 
Learns Topologies", Advances in Neural Information Processing Systems 7, 
MIT Press, Cambridge MA, 1995, p. 625-632. 

6. Fritzke B., "The LBG-U method for vector quantization - an improvement over 
LBG inspired from neural networks". Neural Processing Letters, 5 (1), 1997. 

7. Fritzke B., "A growing neural gas network learns topologies", NIPS 1994, 
Denver. 

8. Balabanovic M., Shoham Y., "Learning Information Retrieval Agents: 
Experiments with Automated Web Browsing", Proceedings of the AAAI Spring 
Symposium on Information Gathering from Heterogenous, Distributed 
Resources, Stanford, CA, March 1995. 

9. Rizzo R. "Self Organizing Networks to Map Information Space in Flypertext 
Development", Proceedings of the International ICSC/IFAC Symposium on 
Neural Computation NC'98, September 23-25, 1998, Vienna, Austria. 

10. Rizzo R., Allegra M., Fulantelli G., Flypertext-like Structures through a SOM 
Network, in Proc. of ACM Flypertext ‘99, (Darmstadt, Germany, Feb. 21-25, 
1999). 

11. Rizzo R., Allegra M., Fulantelli G., Fly.Doc: a System to Support the Study of 
Large Document Collections, in Proc. of ICL99 workshop, (Villach, Austria, 
Oct. 7-8,1999). ISBN 3-7068-0755-6. 

12. Salton G., Allan)., Buckel C., Automatic Stracturing and Retrieval of Large Text 
Files, Communications of ACM, 37, 2, 1994, pp. 97-108 




User Authentication via Neural Network 



Abd Manan Ahmad and Nik Nailah Abdullah 

Software Engineering Department, Faeulty of Computer Seienee and Information 
Teehnology, Universiti Teknologi Malaysia, Skudai 81310 Johor, Malaysia 

e-mail: manan0f sksm. utm.my 



Abstract. The major problem in the eomputer system is that users are 
now able to aeeess data from remote plaees and perform transaetion on- 
line. This paper reports on the experiment and performanee of using 
keystroke dynamies as a user authentieation method. The work is 
designed sueh that it is possible for the eomputer system to identify 
authorized and unauthorized user. This is desired to eontrol aeeess to a 
system that will assign the authorized user upon entering the system. 
The teehnique used to diseriminate the data is Neural Network. This 
paper deseribes the applieation of neural networks to the problem of 
identifying speeifie users through the typing eharaeteristies exhibited 
when typing their own name. The test earried out uses two kinds of 
neural network model, i.e. ADALINE and Baekpropagation Network. 
A eomparison of these two teehniques are presented. 



1 Introduction 

Identifying a person seems straightforward. People do it all the time. But modem 
soeiety has eomplieated things, a haeker aeeesses sensitive database and a 
eounterfeiter makes eopies of banks eards. At all levels, a sure-fire means of 
identifieation has never been in more demand. Today the average businessperson may 
use more than a dozen eomputer passwords-personal identifieation numbers (PINs) 
for automated teller maehines, lieenses and telephone ealling, membership, and eredit 
eards. Yet finding satisfaetory methods of identifying user ean be diffieult. Some 
teehniques are easy to fool and others are felt to be too intmsive. 

One area where teehnology is enhaneing, and often simplifying, our ability to 
identify people is biometries. Biometries systems are automated system of verifying 
or reeognizing the identity of a living person on the basis of some physiologieal 
eharaeteristies, like a fingerprint or iris pattern, or some aspeet of behavior, like 
handwriting or keystroke patterns. Verifieation requires the person being identified to 
lay elaim to an identity, so that system has a binary ehoiee of either aeeepting or 
rejeeting the person’s elaim. Biometries is also eatehing up on automated teller 
maehines (ATMs). Still, it is a steady teehnology. Although there are number of other 
automated biometries system sueh as iris reeognition and voiee reeognition, the tools 
needed to eapture is eostly. 
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Fingerprints vary with emotional state. Voiee too, is eheap to eapture, relying on 
low-eost mierophones or existing telephones, but varies when emotions and states of 
health ehange, and has a large template size. Keystrokes dynamies also known as 
typing rhythms, is one of the most eagerly awaited of all biometries teehnologies in 
the eomputer seeurity arena. This method analyzes the way a user types at a terminal 
by monitoring the keyboard of inputs 1000 times per seeond. The analogy is made to 
the days of telegraphy when operators would identify eaeh other by reeognizing the 
“fist of the sender”. The modem system has some similarities, most notably that the 
user does not realize he is being identified unless told. Also, the better the user is at 
typing, the easier it is to make identifieation. Both the National Seienee Foundation, 
Washington, D.C, and the National Institute of Standards and Teehnology, 
Gaithersburg, MD have eondueted studies establishing that typing patterns are unique 
to the typist. 

The advantages of keystroke dynamies in the eomputer environment are obvious. 
Neither the enrolment nor the verifieation disturbs the regular flow beeause the user 
would be tapping the keys anyway. Sinee the input deviee is the existing keyboard, 
the teehnology eosts less. Despite the inereased aetivity, published reports of the basie 
data and/or methodology are notieeably missing. Thus, this ease study was earried out 
in hope to find the best teehnique to solve the problem of elassifying and filtering the 
data eolleeted from the users. The limitation of the work was to eapture user's typing 
data. The experiment earried out was based on the previous data eolleeted from the 
inventor of system, Joey Rogers. 



2 Keystroke Dynamics 

The system is based upon the eoneept that the eoordination of a person’s fingers is 
neurophysiologieally determined and unique for a given genotype. A user typing or 
keystroke eharaeteri sties ean be measured by examining the timing of the keystrokes 
or the pressure of the keystrokes. Veetors is used to represent the data. The veetor was 
eonstmeted using interleaved hold times and digraph lateney times. 

The hold time of a key time is obtained by subtraeting the press time of the key 
from the release time of the key. A digraph is a two keystroke eombination. The 
digraph lateney is obtained by subtraeting a first key’s release time from a seeond 
key’s press time. The ordering elements of the veetor is not important but the veetors 
should be eonstmeted sueh that the samples relating to a user are eonstmeted in the 
same manner so that they ean be properly eompared. The eomponents of the veetors 
are physieal eharaeteristies of a person’s keystroke eharaeteristies. 

These physieal eharaeteristies are used to eonstmet veetors whieh are proeessed, 
transmitted and stored within the system as signals. The veetor ean be made up of 
data pertaining to the key press time, key release times, digraph lateney times, key 
hold times,keystroke pressure, keystroke aeeelaration or deeeleration, or any features 
relating to the user’s keystroke eharaeteristies. Onee the data is eolleeted and plaeed 
in veetor format, the vetors ean be analyzed to determine if the user is authorized or 
an imposter. The data was normalized using the transformation linear method whieh 
is shown below; 
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N 

xi' = 1/N E xi " 

n=l 

N 

aH = 1/N-l E(x"i-xif 
n=l 



where n= 1 , . . . ,N for number of patterns. 



Next, we need to find a set of variable definition that is sealed, given by; 



x"i' = X "i - xi'" 



a 



Below, is the data of the authorized user, Joey Rogers whom is the sole inventor of 
this system. 



Table 2.0(a) The raw data of authorized user, Joey Rogers 



PRESS 


RELEASE 


TIME 


SHIFT 




0 


J 




87 




SHIFT 


177 




J 


186 


0 




220 


E 




339 




0 


347 


Y 




409 




E 


462 


BACKSPACE 




535 




Y 


571 


SHIFT 




659 


R 




769 




SHIFT 


821 




R 


859 


0 




894 


G 




974 




0 


1012 




G 


1054 


E 




1076 


R 




1179 




E 


1243 


S 




1285 




R 


1400 




S 


1433 


ENTER 




1478 




ENTER 


1572 
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Table 2.0(b) Data of hold time Table 2.0(c) Digraph Latency Time 



DIGRAPH 


LATENCY 


SHIFT;! 


-90 


J;0 


34 


0;E 


-8 


E;Y 


-53 


Y 


-36 


SHIFT 


-9 


SHIFT;R 


-52 


R;0 


35 


0;G 


-38 


G;E 


22 


E;R 


-64 


R;S 


-II5 


S;ENTER 


45 







KEY 


“HOLD” 


SHIFT 


177 


J 


99 


0 


127 


E 


123 


Y 


162 


BACKSPACE 


133 


SHIFT 


162 


R 


90 


0 


II8 


G 


80 


E 


167 


R 


221 


S 


148 


ENTER 


94 



The data gathered from the user need to be normalized before feeding it into the 
neural network. 




Fig. 2.0(d) Algorithm for keystroke dynamics method 
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3 Neural Networks Method 

The main concern of this research was to find the best method to discriminate/purify 
the data collected. In this study, two kinds of neural network model and architecture 
was used to perform as the basic data or methadology. This paper describes the 
application of neural networks to the problem of identifying specific users through the 
typing characteristics exhibited when typing their own names. 

The network was chosen based on the problem to be solved. First of all, the 
previous study was done to compare two kind of methods to discriminate the data 
which are geometric distance and Euclidean distance. The system under investigation 
was then tested using two kind of neural network architecture and model. There are 
AD ALINE and Backpropagation network. The network was chosen based on network 
model, architecture, data and the type of problem.The choice of network model 
depends heavily on the type of problems you would like to solve. The nature of the 
problem usually restricts the choice of network to one or two model. Sometimes the 
choice of network comes down to personal preference or familiarity.in this case, the 
problem to be solved is a pattern classifier problem since it needs to determine which 
pattern belongs to the authorised or non-authorised user. The input layer consists of 
27 nodes, which is equivalent to the number of the input elements. Whereas for the 
middle layer, it is a single middle layer with 24 nodes, which is 90% of the input 
nodes. There will be two output nodes, 1 for the authorised user and 0 for the non- 
authorised user. 

The availability and integrity of data constitute the most important factor for 
training neural networks. The data should fully represent all possible states of the 
problem being tackled and there should be sufficient data to allow test and validation 
data sets to be extracted. The right preparation of data is needed to ensure the 
accurateness of the output. Since the sigmoid activation function is used as the 
transfer function, it generates its output between 0 and 1. It is important for us to 
perform normalisation to scale the data so it will fall between this range. During the 
experiment, the number of input nodes, learning rate value, number of hidden nodes, 
momentum value and performance goal value was changed to find the most suitable 
parameter values. The appropriate parameter values are chosen based from trial and 
error performed during experiment and on the convergence and goal performance 
result. 



3.1 ADALINE 

ADALINE is a processing element developed to take multiple input values and 
produce a single output. The Adaline’s significance is in its ability to learn the correct 
outputs from a set of inputs. When new inputs that were not in the training set are 
presented to the ADALINE, the outputs produced will be based on its training 
experience. It uses the LMS algorithm for its operation. It consists of linear 
combination, hard limiter and a mechanism to change the weights. The summation of 
input xl,x2, xn will be used to determine the value of +1 or -1. 

The control mechanism of the LMS depends heavily on the error signal e that is 
measured by the difference between the desired output d and the linear combination u 
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before quantifying. The weights given by wl,w2,....wp and the threshold value will 
ehange aeeordingly to the LMS algorithm. AD ALINE is usually used in applieation 
where the main eoneem is solving a linear problem and to distinguish noises in a 
signal proeessing. The LMS algorithm is an adaptive signal proeessing. The formula 
for this algorithm originated from spatial filtering. This method is usually used to 
solve problems like temporal filtering. Filter uses tapped delay filter as a signal-flow 
graph. The input veetors are defined as tap to the filter, whieh is shown below as; 

x(n) = [x(n),x(n-l), ,x(n-p-K)]' [5] 

where p is the number of tap. The funetion of LMS algorithm is to ehange the weights 
of tap to enable the output filter y(n) to produee the desired output in least mean 
square. LMS operation eonsists of two proeesses that makes up of the feedbaek loop. 



3.2 Backpropagation Network 

Baekpropagation network that was used in this ease study is a simple 3 -layer 
arehiteeture Baekpropagation network. It eonsists of proeessing elements that ean 
produee eomplex output. It is eapable of learning eomplex pattern, sueh as the pattern 
of human typing. Nonetheless, a Baekpropagation network ean reeognise pattern that 
has gone through ehanges. The strueture is almost the same as a Multi-Layer 
Pereeptron , where the pereeptron uses the Baekpropagation as the learning algorithm 
. Baekpropagation is a type of feedforward paradigm that eonsists of 1 input layer, 1 
or more of hidden layer and I output layer. The value of input is in a binary form. The 
aetivation funetion that is used is the sigmoid hard limiter aetivation funetion. 
Baekpropagation is a supervised learning. The Baekpropagation paradigm is used for 
the operation of eomplex logie, elassifieation of pattern and speeeh analyses .There is 
a training set that eonsists of input and output veetor that is desired. The output that is 
desired is in a veetor form, not in value. 

Ik = XO,Xl,%2, %n => ‘Desired_Ok’ = Z0,Z1,Z2, Z [4] 

Output from the baekpropagation neural network will be ealeulated using the 
proeedure known as forward pass and feedbaek pass. 

i. The input layer will propagate to the eomponent of the input veetor to eaeh node 
of the middle layer 

ii. The middle layer node will be ealeulated as the output value whieh will in return 
serve as the input value for the node in the output layer. 

Mi=/(HkeW,k) 

M = Middle layer 

W; = eonneetion of weights between the input and middle layer. 

C = input node 
I = middle layer node 
F = aetivation funetion 
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The output node will calculate the network output for the input vector ; 

Oj=/ ( 

O = output layer 
J = output node 

W2 = weight connection between the middle layer and output layer 

The activation function used in the forward pass will squash the dot product of 
input vector and weights vector that is equivalent to a value that is easy to manage. 
The function must have certain characterisitics in order to facilitate the learning 
process, it must be non-linear, everywhere defined,and differentiable. The transfer 
function used is shown below; 

1.0 

fix) 

1.0 e"’^ 

The forward pass produces and output vector for a given vector based on the 
current state of the network weights. Since the network weights are initilaized to 
random values, it is unlikely that reasonable outputs will result. The weight are 
adjusted to reduce the error by propagating the output error backwards through the 
network. This process is where the backpropagation neural network gets its name and 
is known as the backward pass. 

i. Compute the error values for each node in the output layer: 

f'(X)^X(H) 

Ej =f '(O j)(Desired_Okj -Oj) 

Ej = error for each output node 

where Ej is the error for each output node. The error for each node can be 
computed because the desired output for each node is known. The difference is 
multiplied by the derivative of the transfer function in preparation for the next 
step. 

ii. Compute the error for the middle layer nodes: 

Eij = /'(M0^W2ijEj) 

Eij = error for the middle layer 

where Ej is the error for the middle layer terms. This method of computing error 
terms for the middle layer nodes, is responsible for making the backpropagation 
neural network . 

iii. Adjust the weigh values to improve the network performance using the Delta rule 

Wik= pE li A I kc+ aPrevious(AWik) 

W2j = pEjMi -I- aPrevious (AW2jO 
Wi =Wi + AW, 

iv. Compute overall error to test network performance : 

The training set is repeatedly presented to the network and the weight values are 
adjusted until the overall error is below a predetermined tolerance. 
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4 Results of User Authentication via Neural Network 
Experiment 

The objective of the latest research was to study the effectiveness of identifying 
authenticate user via neural network. Matlab neural network toolbox ver 5.2 was used 
to carry out the experiment. The experiment used two authentication methods to 
gather comparative performance data, the ADALINE and backpropagation neural 
network. Each model was tested by using the above data and impostor data was 
randomly generated by the computer. Below are the result of the experiment. 



4.1 ADALINE Network 

There will be two sets of samples to train the network, which consists of the training 
samples and test samples. The error rate will be plotted against a graph and these error 
will distinguish between the authentic user and non-authentic user. The ADALINE 
network does not suffer from the local minima problems. However, it is a tedious task 
to classify the data since ADALINE network is not capable of classifying patterns. 
The data of the authentic user will be trained for 100 times before the data of the 
impostor is trained. Delays is used to enable the network to process one element of 
data at a time. For each training, a graph will be plotted againts the authentic user and 
the impostor. The experiment conducted used 20-30 impostor data to perform the 
training. Each impostor data was compared to the authenticate user. Typically, an 
acceptable range of modification of each feature must be defined if there is an 
occurance of difference in features value. In this experiment , the difference between 
the desired output and output is calculated and taken into consideration for the 
purpose of allowing an acceptable range. The sufficient value for determing the 
acceptable range is 0.35. The user's data captured is in milisecond and was 
normalised. Thus, the value of 0.35 is after normalisation of data. 

Each impostor user was trained against the authentic user. Thus, the graph plotted 
below is to show the training of ADALINE network for each user. The graph which 
will be represented is for one user, each graph symbolises the steps taken for each 
training for each impostor and authentic user. 





Number of kev pressed 

Number of key pressed ^ ” 

Fig. 4.1(a) Error for the first Fig. 4.1(b) Comparison between 

impostor user Impostor and user 
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Fig. 4.1(c) The result for the AD ALINE network 

Above is the final result obtained after training 20 sets of data. The result above is 
to distinguish between the unknown user, impostor and authentic user. The authentic 
user line is a straight line, because the error rate is so small that it almost seems like a 
straight line. 



4.2 Backpropagation 

There will be two sets of samples,training samples and test samples. The technique 
done was similar to ADALINE network. The parameters chosen are sum-squared 
error performance function. It is a network performance function. It measures 
performance according to the sum squared of errors. This network function usually 
uses log sigmoid transfer function. The output generated is between 0 and 1 . The data 
of each user was normalised and simulated, both authorised and unauthorised 
normalised data will then be kept in matrices. Later during training, this value will be 
compared and calculation will be performed upon these values. The experiment 
conducted involved 30 users. However, the most obvious difference was that the 
network was able to classify the patterns of samples. The value 1 will classify the 
pattern as an authentic user whereas the value 0 will classify the pattern as 
unathorized user. 




Fig. 4.2(a) Backpropagation performance 

The curve line is to show the performance of the training. As for the straight line, it 
is to show the target. Which is le-6 of sum-squared error performance. As it hits the 
target, a user interface says that if the user is Joey Rogers or an impostor. 

Below is the result of the training of backpropagation network. 
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TRAINGDX, Epoch 0/5000, SSE 0.418556/le-006, Gradient 0.729983/le-006 
TRAINGDX, Epoch 20/5000, SSE 0.00186359/le-006, Gradient 0.0051 2648/ le-006 
TRAINGDX, Epoch 40/5000, SSE 0.001 15658/le-006, Gradient 0.00406104/le-006 
TRAINGDX, Epoch 60/5000, SSE 0.00057285 l/le-006. Gradient 0.00200267/le-006 
TRAINGDX, Epoch 80/5000, SSE 0.000220868/le-006, Gradient 0.00066173/le-006 
TRAINGDX, Epoch 100/5000, SSE 9.88475e-005/le-006, Gradient 0.000254151/le-006 
TRAINGDX, Epoch 120/5000, SSE 4.67756e-005/le-006, Gradient 0.0001 18904/le-006 
TRAINGDX, Epoch 140/5000, SSE 1.9626e-005/ le-006. Gradient 4.76147e-005/le-006 
TRAINGDX, Epoch 160/5000, SSE 7.84937e-006/le-006, Gradient 1.81029e-005/le-006 
TRAINGDX, Epoch 180/5000, SSE 3.15447e-006/le-006, Gradient 6.96721e-006/le-006 
TRAINGDX, Epoch 200/5000, SSE 1.28054e-006/le-006, Gradient 2.72398e-006/le-006 
TRAINGDX, Epoch 206/5000, SSE 9.77546e-007/le-006, Gradient 2.06044e-006/le-006 
TRAINGDX, Performance goal met. 



al=sim(net,P) 

a2=sim(net,Pjr) 

a3=sim(net,Pimp) 



The result obtained was; 

al= 0.99928 0.00067355 
a2 = 0.99928 
a3 = 0.00067355 

HI I am Joey Rogers! 

al is the value of authentic and impostor user after testing the network. It is calculated 
by subtracting the output from the desired output. The command above is the output 
during testing of the user data. TRAINDGX is a network function that updates 
weights and bias according to gradient descent with momentum and adaptive learning 
rate backpropagation. The epoch is the complete cycle of training. SSE is a network 
performance function. It measures performance according to the sum of squared 
errors. When the performance goal hits the target, the training will halt and stop. 



5 Summary and Discussion 

The neural network was able to recognize the authorized user pattern. However, 
certain tehcniques is more sensitive towards some set of identifiable traits in an 
individual typings pattern which another technique is missing, while the sensitiveness 
are reversed for a different individual. 

Further research should be carried out to refine the techniques used to 
discriminate data collected. Although keystroke dynamics seems like the best method 
to solve problems in identification of user identity, much research work should be 
done. This is because the degree of familiarity, emotional influence and environment 
plays an important role in pattern typing. However, the difference is not so obvious. 
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The Backpropagation network seems much more suitable for pattern classifier 
because it can solve a non-linear problem and for its ability to classify pattern. The 
backpropagation network wins over AD ALINE because it is better in generalization. 

In the past papers, one of the approach to refine the methadology used along with 
keystroke dynamics is to include some representations of the Euclidean distance as an 
additonal inputs to the neural network. The Euclidean distance is a vector, which 
compares the distance of certain cluster to find the shortest distance to the center of 
the part/point. It is hoped that this will add the additional sensitivity and the keystroke 
dynamics should be able to breath new ideas into identification problems. 
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Abstract. The paper considers conjunctive and disjunctive version spa- 
ce learning as an incomplete search in complete hypotheses spaces. The 
incomplete search is guided by preference biases which are implemented 
by procedures based on the instance-based boundary sets representation 
of version spaces. The conditions for tractability of this representation 
are defined. As a result we propose to use instance-based boundary sets 
as a basis for the computationally feasible application of preference biases 
to version spaces. 



1 Introduction 

Concept learning is a basic task in machine learning. It is defined under the 
assumption that any concept is a set of the instances in a domain of discourse. 
The instances are expressed in an instance language Li and their descriptions 
in Li form the extensional representation of the concept. Since the extensional 
representation can be infinite we study the concept in a language Lc of concepts. 
The concept language Lc is a set of descriptions that represent concepts inten- 
sionally, i.e., they recognise all extensional representations of the concepts in the 
instance language Li. To make this possible we define a predicate M between Li 
and Lc s.t. a description c in Lc corresponds to a description i in Li, if and only 
if the instance, represented by i, is a member of the concept, represented by c. 

The concept learning task is a quadruple (Tz, Ac, M, (/+, /“)): given lan- 
guages Li and Lc, and the predicate M, the task is to find version space of 
a target concept specified by sets /+ and I~ of positive and negative training 
instances. The version space V S is defined as a set of all the descriptions in Lc 
consistent with training sets [4]. Learning the version space US' is a complete 
search in the concept language Lc: when new training instances are given US' is 
updated s.t. descriptions that incorrectly classify the instances are removed. 

Classifying with version spaces is based on restriction bias [4]. A restriction 
bias means that the concept language Lc is not complete. The language Lc is 
(not) complete if it is (not) true that every concept, extensionally defined in the 
instance language Li, has an intensional description in Lc. The restriction bias 
has led to three main version-space shortcomings: 

(SI) the inability to learn concepts not present in concept languages; 
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(52) the inability to handle noisy training instances. 

Additionally to (SI) and (S2) Haussler has shown in [1] that: 

(53) the basic- version space representation and algorithm are intractable [4]. 
This paper avoids the shortcomings (SI) to (S3) by applying preference biases 

in version-space learning. (A bias is preferential if there exists a strong preference 
for certain concepts description over others.) Applying a preference bias requires: 
(Rl) a complete concept language; and 

(R2) a computationally feasible procedure that implements the bias. 

We fulfil requirement (Rl) and overcome shortcoming (SI) by introducing 
conjunctive and disjunctive extensions of two well-defined concept languages: 
union and intersection preserving languages. We define conditions when the ex- 
tensions are complete concept languages, and thus we introduce conjunctive and 
disjunctive version spaces as classical version spaces defined in these languages. 

We overcome shortcoming (S3) by representing conjunctive and disjunctive 
version spaces with instance-based boundary sets [6] that are tractable for union 
and intersection preserving languages. We show that the instance-based bound- 
ary sets can be used for building computationally feasible procedures that imple- 
ment different preference biases. Thus, we fulfil requirement (R2) and overcome 
shortcoming (S2). The latter is empirically justified by experiments. 

2 Terminology and Definitions 

Instance and concept languages Li and Lc are sets of descriptions. To search in 
Lc we structure Lc. The structure is based on the relation ’’more general” (>). 

Definition 1. (Vci,C2 G Lc)((ci > C2) <--> (Vi G Li){M{ci,i) <— M{c 2 ,i))). 

The relation is a partial ordering [4]. Hence, Lc is a poset that we restrict 
s.t. every subset C C Lc has minimal and maximal elements. 

Definition 2. MIN{C) = {c G C|(Vc' G C)^(c' < c)} 

MAX{C) = {c G CKVc' G C)^{d > c)}. 

To determine when a concept description c is consistent, we define the con- 
sistency predicate. 

Definition 3. A concept description c is consistent w.r.t. to sets I~^ and I~ iff: 
cons{c, U+T-)) ((Vi G I+)M{cff) A (Vi G I~ffM{c,i)). 

We use the cons predicate to define version spaces formally. 

Definition 4. A version space VS w.r.t. a task {Li, Lc, M, (/+,/“)) is: 

VS = {c G Lc \ cons{c, (/+,/“))}. 

Version spaces VS can be represented by minimal and maximal boundary 
sets S and G specified in definition 5 [4]. 

Definition 5. S' = MIN{VS) 

G = MAX{VS). 
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3 UPL Languages and Their Conjunctive Extensions 

Union preserving languages (UPL) are specified in definition 6 [6]. 

Definition 6. A concept language Lc is a union preserving language (UPL) iff 
Lc is a eomplete upper semi-lattice s.t. for every nonempty subset C C Lc: 

(1) the least upper bound of C is in Lc; i.e., lub{C) G Lc; and 

(2) (Vi G Li){{3c e C)M{c,i) ^ M{lub{C),i)). 

Conjunctive extensions of UPL languages are specified below. 

Definition 7. The language CLc is the conjunctive extension of a UPL concept 
language Lc if and only if: 

CLc = {C\C = Cl A ... A c„, n> 1, ci, ..., c„ G Lc}. 

To relate languages CLc and Li we define the predicate Me- 

Definition 8. Consider a UPL concept language Lc and a language CLc that 
is a conjunctive extension of Lc. Lf C € CLc then: 

(Vi G Li){Mc{C,i) ^ (VcG C)M(c,i)). 

To determine when a conjunction C G CLc is consistent, the consistency 
predicate consc on CLc is defined analogously to definition 3. The predicate is 
used in theorem 9 to determine when a conjunctive extension CLc is complete. 

Theorem 9. Lf CLc is the conjunctive extension of a UPL language Lc then: 
(Vi G Li)(3cG Lc)cons{c,{Li—{i} (V/ C Li)(3C G C Lc)consc (C ,{L , Li— I)). 



Theorem 9 states that a conjunctive extension CLc of a UPL language Lc 
is a complete language iff for every instance i £ Li there exists a description 
c G Lc that does not cover only this instance. 

4 Conjunctive Version Spaces 

Conjunctive version spaces are defined in conjunctive extensions. 

Definition 10. (Conjunctive Version Space (CVS)) Consider a task {Li, Lc, M, 
(/+,/“)). If Lc is a UPL language and CLc is the conjunctive extension of Lc 
then the eonjunctive version space CVS of the task is defined as follows: 

CVS ={Cg CLc I consciC, (/+,/-))}. 

Since conjunctive extensions are defined in UPL languages we project con- 
junctive version spaces into UPL languages. Therefore, we define conjunctive 
version spaces in pure UPL languages (see theorem 1) by means of version spaces 
with respect to negative instances (see notations 11 and 12). 
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Notation 11. If is the n-th instance of the set I~ then the version space 
{c G Lc|cons(c, (/+, *„))} is denoted by VSn- A version space {c G Lc|(Vi G 
/+)M(c, *)} is denoted by T^S'o- 

Notation 12. P is the number of positive instances; N is the number of neg- 
ative instances; p is an index of positive instances; n is an index of negative 
instances. 

Theorem 13. Consider a task {Li, Lc, M, {I~^ , I~)) . Then 
CVS = ^^^,VSn. 

By theorem 1 we learn conjunctive version spaces CVS by learning version 
spaces VSn w.r.t. negative instances. 



5 Instance-Based Boundary Sets 

The version space boundary sets can grow exponentially in the number of train- 
ing instances [1]. This holds for the version spaces VSn- the minimal boundary 
sets Sn can grow exponentially in the number of positive instances. To overcome 
this problem we use instance-based boundary sets for representing the version 
spaces VSn- We start introducing instance-based boundary sets by lemma 14 [6]. 

Lemma 14. Consider a task {Li, Lc, M, {I^ , I~)) and its conjunctive version 
space CV S = ^ ■ Then: 

{'in€Q-N){VSn= n ySn,p) 

p=i 

where V Sn,p = {c G Lc|cons(c, ({*p}, {*«}))}. 

Lemma 14 states that each version space VSn is the intersection of simple 
version spaces V Sn,p w.r.t. positive instances. VSn,p allow to define instance- 
based boundary sets of the version spaces VSn- 

Definition 15. (The instance-based boundary sets (IBBS)) If Lc is UPL then a 
version space VSn is represented by an ordered pair {{Sn,i, ---, Sn,p),Gn) where: 

Sn,p = MIN{VSn,p) for all p G 1..P; 

Gn = MAX{VSn)- 

IBBS are ’’instance-based” since they express the minimal boundary sets 
of the version spaces V Sn with the minimal boundary sets Sn,p of simple version 
spaces V Sn,p associated with particular positive instances. It has been shown 
in [6] that the IBBS correctly represent the version spaces VSn- 

Ati Si = {C\ (Vi G i-N){Si n c 7 ^ 0) A (c c ujLiSi)} 



1 



Applying Preference Biases 325 



FOR each training instance i DO 
IF i is a positive instance ip+i THEN 
FOR n = 0 TO N DO 

S'n,P + l = MIN{{c e Lc I cons(c, ({ip + l}, {*n}))}) 

IF i is a negative instance ijv+i THEN 
FOR p = 1 TO P DO 

S'n+i,p = {s G So,p I ijv+i)} 

G'n+i = MAX{{c e Lc\cons{c, (J+, {ijy+i}))}) 

Fig. 1. The CVS Learning Algorithm 

6 The CVS Learning Algorithm 

Our learning algorithm updates conjunctive version spaces w.r.t. new training 
instances. Since conjunctive version spaces are given by the version spaces V Sn, 
their updating is reduced to updating the spaces VSn- Updating the version 
spaces VSn represented by IBBS is based on theorems 16 and 17 [6]. 

Theorem 16. Consider a task {Li, Lc, M, ,in)) with version space VSn 
given by IBBS:{{Sn,i, ■■■, Sn,p),Gn), and a task {Li, Lc, M, {I^ G {ip+i},*„)) 
with version space VS'n given by IBBS:{{Sn i, ■■■, S!^ p^i),Gn) ■ If Lc is UPL 
then: 

S'n,p = Sn,p for all p G1..P; 

S'n^P+i = MIN{{c € Lc I cons{c, ({ip+i}, {*™}))})- 

Theorem 17. Consider a task {Li, Lc, M, (J+, 0)) with version space V Sq given 
by IBBS:{{Sq^i, ...,Sq^p),Go) and a second task {Li, Lc, M, {I^ ,{in\)) with ver- 
sion space VSn given by IBBS:{{S'n i, ..., S'^ p) ,G'n) ■ If Lc is UPL then: 

S'n,p = {s G •S'o.p I ~^M{s,in)} for all p G 1..P; 

G'n = MAX{{c G Lc \cons{c, (/+, {in}))})- 

The learning algorithm is based on theorems 16 and 17, and is given in 
figure 1. If a new positive instance ip+i is given then the algorithm updates 
every version spaces VSn for n G 0..1V as follows: 

(1) the maximal boundary set G„ is not changed. 

(2) the minimal boundary sets Sn,p are not changed for p G 1..P. 

(3) the minimal boundary set S{^ pp^ is generated as a set of minimal descriptions 
in Lc that cover ip+i and do not cover the corresponding negative instance in- 

^From steps (1) to (3) above, it follows that the IBBS representations of Sn 
and Gn of VSn are generalised to cover the instance ip+i- Therefore, the con- 
junctions in GV S that do not cover ip+i are removed. 

If a new negative instance iN+i is given then the algorithm generates only 
the IBBS representation of the new version space VS'j,fpp 

(4) the S'j-^pi part of the IBBS is generated from the minimal boundary sets So,p 
(for all p G I..P) by removing the elements covering the instance iN+i- 
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(5) the set of the IBBS is generated from the maximal elements in Lc that 

cover the set /“'■ and do not cover the instance in+i- 

From steps (4) and (5) it follows that the conjunctions in CVS that 
cover iN+i are removed by adding to CVS the version space VS'j^j^i. 



7 Classification Based on Restriction Biases 

If theorem 9 does not hold then the conjunctive extensions CLc are incomplete. 
Hence, we have restriction bias and we can apply the unanimous vote classifica- 
tion rule [4]: if all descriptions in a conjunctive version space CVS agree on a 
classification of an instance then the instance gets the classification. Since CVS 
is given with the version spaces V Sn and (Vn G [0,A^])(FS'o 3 V Sn) (see no- 
tation 11) then the unanimous vote rule works by theorem 1 as follows: (1) if 
the version space V Sq agrees on the positive classification of an instance then 
the instance is classified as positive; (2) if at least one version space V Sn agrees 
on the negative classification of an instance then the instance is classified as 
negative; (3) if the cases (1) and (2) have failed then the instance is unclassified. 

8 Classification Based on Preference Biases 

If theorem 9 holds then conjunctive extensions CLc are complete. Hence, prefer- 
ence biases can be applied. Classification based on preference biases is computa- 
tionally feasible when IBBS can be used for building procedures that implement 
these biases. IBBS facilitate implementing preference biases since: 

(1) removing any minimal boundary set Sn,p for given n and p means that the 
corresponding V Sn is not consistent anymore with the positive instance ip', 

(2) removing the IBBS of a version space V Sn for a given n means that the 
conjunctive version space CVS is not consistent anymore with the negative 
instance in (a corollary of theorem 1); 

(3) each minimal boundary set So,p is the union of the minimal boundary 

sets Sn,p, i.e., So,p = P ^ corollary of theorem 17). 

We use properties (1) to (3) and propose two types of approaches for imple- 
menting preference biases. 



8.1 Separate-and- Conquer Approach 

The separate-and-conquer approach is based on the IBBS of the minimal bound- 
ary set So which is a rich structure containing complete information about pos- 
itive training data. Hence, it can use this structure to build a large spectrum of 
separate-and-conquer algorithms. In this light we give two simple separate-and- 
conquer algorithms that demonstrate the vitality of the approach. 
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C = 0 

while -I StoppingCriterion{C, , I~) do 
c = 0 

for p = 1 to P do 
maxPerf = Perf{c, I'^, I~) 
maxS = 0 

for each s £ 5o.p do 

if (Perf{c\/ > MaxPerf) then 

maxS = s 

maxPerf = Perf{c V s,I^ ,I~) 
c = c V maxS 
C = CAc 

Remove from I~ all instances that are not covered by c 
Fig. 2. The CLc-Separate-Conquer Algorithm 



CLc-Separate-and- Conquer Algorithm. The algorithm (CLc-SCA) learns 
conjunctions of target concepts in conjunctive extensions CLc (see figure 2). It 
starts with an empty conjunction C . Then it ’’greedily” builds every conjunct c 
of C from these elements of the sets 5'o,p (p G 1..P) that maximise a user supplied 
function Perf in a disjunction. Hence, every conjunct c is a disjunction of so- 
chosen elements of the set 5o,p. The process of generating conjuncts c stops when 
the StoppingCriterion fires. 

Note that the for loops, that learns conjuncts c, search in UPL concept lan- 
guages Lc. The outer loop while learns conjunctions C in conjunctive extensions 
CLc. That is why the algorithm is labeled with CLc. 



DNF-Separate-and- Conquer Algorithm. The main problem with the CLc- 
sequential-covering algorithm is that the size of the conjunctions C can be expo- 
nential in the number of positive instances [1]. To avoid this problem we propose 
a DNF-separate-and-conquer algorithm (DNF-SCA) given in figure 3. 

The DNF-separate-and-conquer algorithm learns disjunctions of target con- 
cepts; the disjuncts belong to conjunctive extensions CLc (see figure 3). It starts 
with an empty disjunction D. Each disjunct Cp of D is initialised equal to the 
conjunction of all elements of the corresponding set So^p. The conjuncts s of Cp 
are ’’greedily” removed s.t. the disjunct Cp maximises a user supplied function 
Perf. Revising a disjunct Cp stops when StoppingCriterion fires. The disjunc- 
tion D is formed when all minimal boundary sets 5'o,p are visited. 

Note that the for loop, learning disjuncts Cp, searches in subspaces of con- 
junctive extensions CLc. The disjunction D is learned in a disjunction of these 
subspaces. Hence, D is formed in DNF extensions of UPL concept languages. 
That is why the algorithm is labeled with DNF. 

In both separate-and-conquer algorithms the user supplied function Perf can 
be any metric used in bottom-up induction. In our experiments we use the m- 
etsimate with m = 3 [2]. The StoppingCriterion is a predicate that becomes 



328 



E. N. Smirnov and H. J. van den Herik 



D = 0 

for p = 1 to P do 

Cp = conjunction of all elements of the set So,p 
maxPerf = Per f{Cp, , I~) 
while -1 StoppingCriterion{Cp, I'^ , I~) do 
maxS — 0 

for each s G So.p do 

if {Perf{Cp — {s},P'",7~) > MaxPerf) then 
maxS = s 

maxPerf — Perf{Cp — {s},I^,I~) 

Cp = Cp — {maxS} 

D = DwCp 

Fig. 3. The DNF-Separate-Conquer Algorithm 



true when 3 percent of the negative instances remain uncovered. The resulting 
predictive accuracies for the Monks tasks [3] is given in figure 4. 

8.2 Voting Classification Approach 

The voting classification approach is based on an approach given in [-5] . If a new 
instance has to be classified then its classification is determined by an Af-vote of 
version spaces V Sn for n G 1..N. The Af-vote means that Af numbers of version 
spaces V Sn have to classify the instance as positive in order to obtain the positive 
classification. The procedure for classification with one version space V Sn is 
based on two parameters S and V. The parameter S determines how many 
descriptions for particular Sn,p have to cover an instance in order to classify 
it as positive. Analogously, the parameter V determines how many minimal 
boundary sets Sn,p classifying the instance as positive are needed s.t. the instance 
is classified as positive by the corresponding version space V Sn- 

The parameters Af, S and V can be adjusted globally for all VS'n by a process 
of trials and errors proposed in [-5] . The training set is divided into two training 
sets . The first one (70%) is used for inducing the conjunctive version space CVS 
given with the version spaces V Sn represented by IBBS. The second training set 
(30%) is used for adjusting the parameters Af, S and V. The measured predictive 
accuracies for the Monks tasks are given in figure 4. 



Algorithm 


Ml 


M2 


M3 


CLc-SCA 


100% 


64% 


95.55% 


DNF-SCA 


100% 


64% 


95.55% 


Af-vote 


94% 


69% 


92% 



Fig. 4. The Predictive Accuracy on the Monks Tasks 
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9 Analytical Complexity Analysis 

The analysis is made in terms of P and N (see notation 12); and S and T - the 
largest sizes of the minimal and maximal boundary sets of version spaces V Sn,p- 

The space complexity of a conjunctive version space with + 1 number of 
spaces V Sn is 0{N{PE+P)) since the complexity of IBBS of one V Sn is 0{PS+ 
P). The space complexity of the resulting conjunctions C and disjunctions D of 
CLc-SCA and DNF-SCA algorithms is 0{PE) since in the worst case it is equal 
to the space complexity 0{PE) of IBBS of the set Sq. 

The time complexity of the CVS learning algorithm is equal to the complexity 
0{N{G{S'^ _p+i))) for handling positive instances plus the complexity 0{PE + 
for handling negative instances, where G{S'^ and G{G'j^j^^) are 
generation complexities of the sets S'^ and G)y+i- 

The time complexity of the unanimous positive classification is 0{PE) since 
it is based on the S-part of IBBS of the version space V Sq. The same complexity 
in the case of negative classification is 0{NP) since it is based on the G-part of 
IBBS of the version spaces VSn- 

^From the complexity analysis follows that IBBS representations of conjunc- 
tive version spaces, conjunctions C and disjunctions D are tractable iff E and P 
are polynomial in relevant properties of UPL languages. This holds for the CVS 
algorithms based on restriction biases. They are tractable iff A, T, G{S'^ p+i)i 
G{G'pip^) are polynomial in relevant properties of UPL languages. 

10 Experimental Complexity Analysis 

The analysis is based on the experiments of our CVS systems implemented in 
JAVA2 on a Pentium-II computer. The results reported were observed during 
experiments with the Monks tasks. 

The size of IBBS of conjunctive version spaces grows linearly with the number 
of the training instances. This is due to our current implementation based on a 
single representation trick, i.e., Li C Lc. Thus, the IBBS of conjunctive version 
spaces are represented with the positive instances. 

The resulting conjuncts G of the CLc-SCA algorithm contain in average 
5 to 6 conjuncts. Each conjunct has 3 to 4 descriptions of Lc. The resulting 
disjunctions D of the DNF-SCA algorithm contain P numbers of disjuncts. Each 
disjunct has 3 to 4 descriptions of Lc. To reduce the number of disjuncts we 
remove their repeated exemplars. Thus, the average number of disjuncts is 5 
to 6. 

The running time per training instance of the CVS learning algorithm grows 
linearly between 6 to 9 millisec. The running times per training instance of 
CLc-SCA and DNF-SCA algorithms are comparable. They grow quadratically 
between 4 and 1200 millisec. The running time for adjusting parameters A/”, S 
and V for the Af-vote classification is problematic since we use a brute-force 
algorithm. The time varies between 460 000 and 670 000 millisec. 
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11 The Dual Part of the Paper 

Intersection preserving languages, their disjunctive extensions and version spaces 
can be derived by duality from the previous eight sections. Uirfortunately, the 
space considerations preclude presenting details. 

12 Comparison with Relevant Work 

We can compare our work solely with [.5] since this is the only work in applied 
version spaces. The paper [-5] introduces the voting classification in the coirtext 
of disjunctive version spaces. Its approach faces the same problem as the Af- 
vote : the number of compound version spaces in disjunctive and conjunctive 
version spaces can be exponential in the number of training instances. Moreover, 
we discovered an analogous problem with the CLc-SCA algorithm : the size of 
the conjuncts C cair be exponential. That is why we proposed the DNF-SCA 
algorithm that successfully solves this problem. 

13 Conclusion 

This paper proposes to avoid the main shortcomings (SI) to (S3) of version 
spaces (see section 1) via applying preference biases. Shortcoming (SI) is over- 
come by introducing conjunctive and disjunctive extensions of unioir aird iirter- 
section preserviirg languages. The condition for the extensions to be complete 
coircept languages is shown, aird in this coirtext conjunctive and disjunctive ver- 
sion spaces are defined. Shortcoming (S3) is resolved by representing conjunc- 
tive and disjunctive version spaces with instance-based boundary sets of which 
the conditions for tractability are analysed. It is shown that the instance-based 
boundary sets can be used for building computationally feasible procedures im- 
plementing preference biases. Thus, the most serious shortcoming (S2) is also 
overcome: we can handle noisy training data with version spaces (see the pre- 
dictive accuracy for the M3 task with 5% noise in figure 4) . 

References 

1. Haussler, D.: Quantifying Inductive Bias: AI Learning Algorithms and Valiants 
Learning Framework. Artificial Intelligence 36 (1988) 177-221 322, 324, 327 

2. Lavrac, N., Dzerovski, S.: Inductive Logic Programming: Techniques and Applica- 
tions. Ellis Horwood, New York (1994) 327 

3. Michalski,R.S, Tecuci, G.: Machine Learning: A Multistrategy Approach. Morgan 
Kaufmann, San Francisco (1994) 328 

4. Mitchell, T.: Machine Learning. McGraw-Hill (1997) 321, 322, 326 

5. Sebag, M., Rouveirol, G.: Tractable Induction and Glassification in First Order 
Logic via Stochastic Matching. In: Proceedings of The International Joint Confer- 
ence on Artificial Intelligence. Morgan Kaufmann, San Mateo (1997) 888-893 328, 
330 



Applying Preference Biases 331 



6. Smirnov, E. N., Braspenning, P. J.: Version Space Learning with Instance-Based 
Boundary Sets. In: Proceedings of The European Conference on Artihcial Intelli- 
gence. Jonh Willey and Sons, Chichester (1998) 460-464 322, 323, 324, 325 



Coverage-Based Semi-distance between Horn 

Clauses 



Zdravko Markov^ and Ivo Marinchev^ 

^ Department of Computer Science, Central Connecticut State University 
1615 Stanley Street, New Britain, CT 06050, U.S.A. 
markovzSccsu . edu 

^ Faculty of Mathematics and Informatics, University of Sofia 
5 James Bouchier Str., 1164 Sofia, Bulgaria 
ivo@f mi . uni-sof ia . bg 



Abstract. In the present paper we use the approach of height functions 
to defining a semi-distance measure between Horn clauses. This appraoch 
is already discussed elsewhere in the framework of propositional and sim- 
ple first order languages (atoms) . Hereafter we prove its applicability for 
Horn clauses. We use some basic results from lattice theory and introduce 
a family of language independent coverage-based height functions. Then 
we show how these results apply to Horn clauses. We also show an exam- 
ple of conceptual clustering of first order atoms, where the hypotheses 
are Horn clauses. 



1 Introduction 

Almost all approaches to inductive learning are based on generalization and/or 
specialization hierarchies. These hierarchies represent the hypothesis space which 
in most cases is a partially ordered set under some generality ordering. The prop- 
erties of partially ordered sets are well studied in lattice theory. One concept from 
this theory is mostly used in inductive learning - this is the least general gen- 
eralization (Igg) which given two hypotheses builds their most specific common 
generalization. The existence of an Igg in a hypothesis space directly implies that 
this space is a semi-lattice (the Igg plays the role of infimum). Thus the Igg-based 
approaches are theoretically well founded, simple and elegant. 

Lgg’s exist for most of the languages commonly used in machine learning. 
However all practically applicable (i.e. computable) Igg’s are based on syntactical 
ordering relations. A relation over hypotheses is syntactical if it does not account 
for the background knowledge and for the coverage of positive/negative exam- 
ples. For example, dropping condition for nominal attributes, instance relation 
for atomic formulae and 0-subsumption for clauses are all syntactical relations. 
On the other hand the evaluation of the hypotheses built by an Igg operator 
is based on their coverage of positive/negative examples with respect to the 
background knowledge, i.e. it is based on semantic relations (in the sense of the 
inductive task). This discrepancy is a source of many problems, where overgen- 
eralization is the most serious one. 
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The idea behind the Igg is to make ’’cautious” (minimal) generalization. 
However this property of the Igg greatly depends on how similar are the hy- 
potheses/examples used to build the Igg. For example there exist elements in 
the hypothesis space whose Igg is the top element (empty hypothesis). This is 
another source of overgeneralization. 

An obvious solution of the latter problem is to use a distance (metric) over 
the hypothesis/examle space in order to evaluate the similarity between the 
hypotheses/examples. The basic idea is when building an Igg to choose the pair 
of hypotheses/examples with the minimal distance between them. Thus the Igg 
will be the minimal generalization over the whole set of hypotheses/examples. 
Various distance measures can be used for this purpose. The best choice however 
is a distance which corresponds to the Igg used, that is the pair of the closest 
hypotheses must produce the minimal Igg. To ensure this, the distance and the 
Igg must be well coupled. Ideally such a distance exists in semi-lattices, however 
it is based on syntactical relations and as we mentioned above the best way to 
evaluate the similarity between hypotheses is to use semantic relations. This is 
a typical problem in Inductive Logic Programming ([4]), where the hypotheses 
are usually Horn clauses which are generated by syntactical operators (e.g. 6 - 
subsumption Igg) and evaluated by coverage-based functions. 

In the present paper we use the approach of height functions to defining a 
semi-distance on a join semi-lattice. This appraoch was already discussed for 
propositional and simple first order languages (atoms) in [3] . Hereafter we prove 
its applicability for Horn clauses. For this purpose we repeat some of the basic 
results and further elaborate the notions introduced in [3]. 

The paper is organized as follows. The next section introduces some basic 
notions from lattice theory used throughout the paper. Section 3 describes the 
height-based approach to defining a semi-distance on a join semi-lattice. Section 
4 proves the applicability of this appraoch to Horn clauses and Section 5 shows 
an example of this. Finally Section 6 concludes with a discussion of related 
approaches and directions for future work. 

2 Preliminaries 

The discussion in this section follows [3] with some modifications and elabora- 
tions (the proofs of the theorems are also skipped). 

Definition 1 (Semi-distance, Quasi- metric). A semi-distance (quasi- 
metric) is a mapping d:OxO— >5ftona set of objects O with the following 
properties (a, b,c € O): 

1. d{a,a) = 0 and d{a,b) > 0. 

2. d{a,b) = d(b,a) (symmetry). 

3. d{a, b) < d{a, c) -\- d{c, b) (triangle inequality). 
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Definition 2 (Order preserving semi-distance). A semi-distance d : O x 
O — > 5ft on a partially ordered set (O, is order preserving iff for all a,b,c G O, 
such that a ^ 6 ^ c it follows that d{a, b) < d{a, c) and d(&, c) < d{a, c) 

Definition 3 (Join/Meet semi-lattice). A join/meet semi-lattice is a par- 
tially ordered set (A, ^) in which every two elements a,b G A have an infi- 
mum/supremum. 

Definition 4 (Diamond inequality). Let (A, ^) be a join semi-lattice. A 
semi-distance d : A x A — > 5ft satisfies the diamond inequality iff the existence 
of sup{a,b} implies the following inequality: d{inf{a,b},a) d(inf{a,b},b) < 
d{a, sup{a, 6 }) -|- d{b, sup{a, 6 }). 

Definition 5 (Size function). Let (A, ^) be a join semi-lattice. A mapping 
s : A X A ^ 5ft is called a size function if it satisfies the following properties: 

51. s(a, b) > 0, Va, b G A and a ^ b. 

52. s{a, a) = 0, Va C A. 

53. Va, h,c G A, such that a ^ c and c ^ 6 it follows that s(a, b) < s(a, c)-|-s(c, b) 
and s(c, b) < s{a, b). 

54. Let c = in/{a, 6}, where a,b G A. For any d G A, such that a < d and b < d 
it follows that s(c, a) -h s(c, b) < s{a, d) s{b, d). 

Consider for example the partially ordered set of first order atoms under 6- 
subsumption. A size function s{a,b) on this set can be defined as the number 
of different functional symbols (a constant is considered a functional symbol of 
arity zero) occurring in the substitution 6 mapping a onto b {aO = b). A family 
of similar size functions is introduced in [1], where they are called a size of 
substitution. Although well defined these functions do not account properly for 
the variables in the atoms and consequently cannot be used with non-ground 
atoms. 

Theorem 1. Let (A, ^) be a join semi-lattice and s ~ a size function. Let also 
d(a, b) = s{inf{a, 6}, a) -h s(inf{a, b}, b). Then d is a semi-distance on (A, ^). 

A widely used approach to define a semi-distance is based on an order pre- 
serving size function and the diamond inequality instead of property 5'4. The use 
of property 5'4 however is more general because otherwise we must assume that 
(1) all intervals in the lattice are finite and (2) if two elements have an upper 
bound they must have a least upper bound (supremum) too. An illustration of 
this problem is shown in Figure 1, where 03 is an upper bound of h\ and &2 and 
e = sup{bl, b2}. Generally the interval [e, 03 ] may be infinite or e may not exists. 
This however does not affect our definition of semi-distance. 

Further, a size function can be defined by using the so called height func- 
tions. The approach of height functions have the advantage that it is based on 
estimating the object itself rather than its relations to other objects. 

Definition 6 (Height function) . A function h is called height of the elements 
of a partially ordered set (A, ^) if it satisfies the following two properties: 
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d 




fol C b'2 




Fig. 1. A semi-lattice structure 



HI. For every a,b G A if a ^ b then h{a) < h{b) (isotone). 

H2. For every a,b G A if c = inf {a, b} and d G A such that a < d and b < d 
then h{a) + h{b) < h{c) + h{d). 

Theorem 2. Let (A, be a join semi-lattice and h he a, height function. Let 
s(a, b) = h{b) — h{a),\/a :< b G A. Then s is a size function on (A, ^). 

Corollary 1. Let (A, be a join semi-lattice and hhe a height function. Then 
the function d{a,b) = h{a) + h{b) — 2ft,(in/{a, 6}), Va, 6 G A is a semi- distance 
on (A, ^). 

3 Semantic Semi-distance on Join Semi-lattices 

In this section we briefly outline the approach to defining a semantic semi- 
distance on join semi-lattices originally introduced in [3]. 

Let A be a set of objects and let and <2 be two binary relations on A. 
Let also be a partial ordering and (A, ^ 1 ) - a join semi-lattice. 

Definition 7 (Ground elements of a join semi- lattice (GA)). GA is the 

set of all maximal elements of A w.r.t. ^1, i.e. GA = {a|a G A and ~Ab G A : 
a b}. 

Definition 8 (Ground coverage). For every a G A the ground coverage of a 
w.r.t ^2 is Sa = {b\b G GA and a <2 b}. 

The ground coverage Sa can be considered as a definition of the semantics 
of a. Therefore we call ^2 a semantic relation by analogy to the Herbrand inter- 
pretation in first order logic used to define the semantics of a given term. The 
other relation involved, is called constructive (or syntactic) relation because 
it is used to build the lattice from a given set of ground elements GA. 

The basic idea of our approach is to use these two relations, and ^2 to 
define the semi-distance. According to Corollary 1 we use the syntactic rela- 
tion to find the infimum and the semantic relation ^2 to define the height 
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function h. The advantage of this approach is that in many cases there exists 
a proper semantic relation however it is intractable, computationally expensive 
or even not a partial order, which makes impossible to use it as a constructive 
relation too (an example of such a relation is logical implication). Then we can 
use another, simpler relation as a constructive one (to find the infimum) and 
still make use of the semantic relation (in the height function). 

Not any two relations however can be used for this purpose. The following 
teorem states the necessary conditions for two relations to form a correct height 
function. 

Theorem 3. Let H be a set of objects and let ^2 and be two binary relations 
in A such that: 

1. is a partial order and {A, ^ 1 ) is a join semi-lattice. 

2. For every a,b € Aii a then |S'o| > |S'b|^. 

3. For every a,h & A and c = inf {a, 6} such that there exists d = sup{a, 6} 
one of the following must hold: 

Cl. \Sd\ < l^al and \Sd\ < l^bl 
C2. \Sd\ = l^al and = \St\ 

C3. \Sd\ = |5b| and = \Sa\ 

Then there exists a family of height functions h{a) = where a G A, x G iR 

and X > 2. 

Proof. 

1. Let a,b G A, a <ib. Then by the assumptions |S'a| > \St\ and hence h{a) < 
h{b). 

2. Let a,b G A, c = inf {a, b} and d = sup{a, 6}. 

(a) Assume that Cl is true. Then \Sd\ < |5'a| and \Sd\ < |S'b| |S'a| > 

\Sd\ + 1 and \Sb\ > + 1 ^ -|5a| < -l^dl - 1 and -\Sb\ < -|^d| - L 

Hence h{a) + h{b) = = 2x~^‘‘^~^ < 

3-_3.-|Sd|-l _ jj-ISdl _ 

(b) Assume that C2 is true. Then \Sd\ = l^a] and |S'd = |5b|. Hence h{a) + 
h{b) = h{c) + h{d). 

(c) Assume that C3 is true. Then |S'd| = |S'b| and l^d = |5'o|. Hence h{a) + 
h{b) = h{c) + h{d). 

4 Coverage-Based Semi-distance between Horn Clauses 

Within the language of Horn clauses we use 0-subsumption for the construc- 
tive relation and logical implication (semantic entailment) for the semantic 
relation ^ 2 - 

Definition 9 (0-subsumption). Let a and b be Horn clauses. Then a 6- 
subsumes b denoted a b, iff there exist a substitution 6, such that aO C b (the 
clauses are considered as sets of literals). 



^ Generally an isotone property is required here. However we skip the other case, 
15'al < ISbl since it is analogous. 
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Under 0-subsumption a set of Horn clauses with same predicates at their 
heads (same functors aird arity) forms a join semi-lattice, where the joiir oper- 
ator is the 0-subsumption-based least general geireralizatioir (Iggg). Further, we 
will show that 0-subsumptioir aird logical implication cair be used to defiire a 
correct height function oir this semi-latice which in turn implies the existeirce of 
a coverage-based semi-distance between Horir clauses. 

Definition 10 (Model). A set of ground literals which does not contain a 
complementary pair is called a model. Let M he & model, c - a clause, and C - 
the set of all ground clauses obtained by replacing the variables in c by ground 
terms. M is a model of c iff each clause in C contains at least oire literal from M. 

Definition 11 (Semantic entailment). Let fi aird /2 be well-formed formu- 
lae. fi semantically entails / 2 , denoted /i ^ /2 (or fi / 2 ) iff every model 
of fi is a model of / 2 . 

Corollary 2. Let a and b be clauses such that a :<g b. Then Sa 2 Sg and 

l^al > \Sb\. 

Proof. Let a and b be clauses and let a 0-subsumes b. According to Definitions 
9 and 10 a semantically entails b, i.e. a b. Then according to Definition 
8Sa2Sb and |^a| > 

Now we will show that the two assumptions of Theorem 3 hold: 

1. Let a and b be clauses and let a b. Then by Corollary 2 jS'al > |5'f,|. 

2. Let d = sup{a, b} w.r.t. Then a <g d, b :<g d, and by Corollary 2 \Sd\ < 
l^al and \Sd\ < |S'b|. Further, we will show that actually \Sd\ < |5'a| aird 
|>S'c;| < |5'b|. First, we assume that for any two clauses ci and C 2 if S'ci = <S'c 2 
then Cl = C 2 . Thus, in fact instead of clauses we use equivalence classes of 
clauses w.r.t. Let x € Sa^Sb (symmetric difference). Assume now that 
X G Sd. Then by Corollary 2 Sd Q Sa and Sd Q Sb, that is a: S SaPSb which 
is a contradiction. Hence x ^ Sd, i.e. Sd C Sa and Sd C Sb, i.e. I^dl < |5'a| 
and \Sd\ < |5'b|. 

Then according to Corollary 1 the following function is a semi-distance 
d{a,b) 

where a and b are Horn clauses and Sa, Sb and Sigg^i^a.b) are models of a, b and 
lggg{a,b). 

5 Example 

To illustrate the semi-distance between Horn clauses we use the inductive al- 
gorithm described in [3,2]. The algorithm starts with a given set of examples 
(ground atoms) GA and builds a hierarchy of Horn clauses covering this ex- 
amples (i.e. a partial lattice, where GA is the set of maximal elements of the 
lattice). The algorithgm is as follows: 
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1. Initialization: G = GA, C = GA] 

2. If \G\ = I then exit; 

3. T = {h\h = lggg{a, b), (a, b) = argmina,becd{a, b)}; 

4. DC = {h\h G C and 3hmin G T : hmin <2 h}] 

5. C = C\DC] 

6. G = G U T, C = C U T, go to step 2. 

We use 10 instances of the member predicate and supply them as a GA set 
to our algorithm. Figure 2 shows the lattice structure built upon this set of 
examples. The two successors of the top element form the well known definition 
of member (the recursive clause contains a redundant literal) . The generated tree 
structure can be seen as an example of conceptual clustering of first order atoms, 
where the hypotheses are Horn clauses. 



memb(l,[3,l]) 



memb(A,[B,C|D]) 

[memb(A,[A]), 

memb(A,[C|D]), 

memb(A,[3,A])] 



memb(l,[2,3,l]) 



memb(2,[3,2]) 



memb(a,[b,a,b]) 

memb(A,[B,C|D]):- 

[memb(A,[C|D]), / ’ ’ 

memb(A,[A])] ”Tb(A [1|CM memb(b,[c,b]) 



memb(A,[B|C]) [] 



memb(A,[A|B]) [] memb(A,[A]) [] 



memb(a,[a,b]) 



memb(A,[A]) [memb(A,[3,A])] 



memb(b,[b]) 



memb(2,[2]) 



memb(l,[l]) 



memb{a,[a]) 

Fig. 2. Hypothesis space for the instances of the member predicate 



A major problem in applying our algorithm is the clause reduction. This is 
because although finite the length of the Iggg of n clauses can grow exponen- 
tially with n. Some well-known techniques of avoiding this problem are discussed 
in [4] . By placing certain restrictions on the hypothesis language the number of 
literals in the Iggg clause can be limited by a polynomial function independent 
on n. Currently we use ij-determined clauses in our experiments (actually 22- 
determinated) . 
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6 Conclusion 

Distance measures are widely used in machine learning, pattern recognition, 
statistics and other related areas. Most of the distances in these areas are based 
on attribute- value (or feature- value) languages and further elaborate well known 
distances in feature spaces (e.g. Euclidean distance, Hamming distance etc.). 
Recently a lot of attention has been paid to studying distance measures in first 
order languages. The basic idea is to apply the highly successful instance based 
algorithms to relational data described in the much more expressive language of 
first order logic. Various approaches have been proposed in this area. Some of the 
most recent ones are [1,5, 6, 7]. These approaches as well as most of the others 
define a simple metric on atoms and then extend it to sets of atoms (clauses 
or models) using the Hausdorff metric or other similarity functions. Because of 
the complexity of the functions involved and problems with the computability 
of the models these approaches are usually computationally hard. Compared to 
the other approaches our approach has two basic advantages: 

— It is language independent, i.e. it can be applied both within propositional 
(attribute-value) languages and within first order languages. 

— It allows consistent integration of generalization operators with a semantic 
distance measure. This makes the approach particularly suitable for induc- 
tive algorithms, such as the one discussed in Section 5. 

We see the following directions for future work: 

— Particular attention should be paid to the clause reduction problem when 
using the language of Horn clauses. Other Igg operators, not based on 9 - 
subsumption should be considered too. 

— The practical learning data often involve numeric attributes. In this respect 
proper relations, Igg’s and covering functions should be investigated in order 
to extend the approach for handling numeric data. 

— More experimental work should be done to investigate the applicability of 
the proposed algorithm in real domains. 
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Abstract. A method is proposed for supervised learning to classify bit 
strings for three classes. The learner was modeled by two formal con- 
cepts: transformation system and stability optimization. Even though a 
small set of short examples were used in the training stage, all bit strings 
of any length were classified correctly in the online recognition stage. The 
learner successfully learned to devise a way by means of metric calcula- 
tions to classify bit strings according to 3-parity-ness while the learner 
was never told the concept of 3-parity-ness. 

Keywords: supervised learning, unifying metric approach, stability quo- 
tient, stability optimization, transformation systems 



1 Introduction 

In [3], a general model was proposed for inductive learning in two classes. It 
solved the imsupervised parity problem of unbounded length beautifully with 
100% accuracy. No other general learning algorithm, including neural networks, 
has been able to achieve that result because they all assume a fixed normed vector 
space throughout the learning process. This paper extends the supervised part 
of [3] for learning in three classes. This line of the metric approach to learning has 
been applied to the benchmark parity problem [3], supervised classification of 
chromosomes [6], unsupervised classification of chromosomes [1], classifications 
of Irises and Pima Indians Diabetes, etc. [2]. Wong, Shen, and Wong [7] used 
it for texture classification and did experiments on hundreds of training objects 
at a time. Liang and Clarson used it on brainwaves [4]. The approach is related 
to Fisher’s discriminant [5, p. 115], which maximizes the distance between class 
centers while at the same time minimizing the within-class scatters. 

A pattern language describes a set P of objects. The object set is divided into 
exactly three mutually-exclusive subsets, Qi,Q 2 ,Qs C P. Presumably, each of 
these subsets is a class of objects. The teacher supplies three finite training groups 
of objects to the learning agent. The union of the training groups Qi U Q 2 U Qa is 
called the training set. The agent then autonomously devises a way to distinguish 
the three sets Qi, Q 2 , and Q 3 by means of metric calculations. Hence, the next 
time an unknown object p G P is presented to the agent, it will be able to classify 
it as belonging to Qi,Q 2 , or Q 3 . The learner accomplishes this by deriving a 
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stable metric space to separate the traiiring groups. A stable metric space is oire 
coirtaining well-separated, compact clusters. 

2 The Model 

Let T = (P, S) be a transformation system, where P is an underlying set of 
structural objects described by a patterir language. 

Let s = X <-> 2 / be a substitutioir operation where x and y are subobjects. An 
object X can be transformed to the object Y via rule s by matching a subobject x 
from X and replacing it with y. Substitution rules are bidirectional in that the 
substitution of subobject x by subobject y implies that the substitution of y 
by x is also possible. When x = 9 \s empty, the operation is called insertion. 
When y is empty, the operation is called deletion. S = (si, S 2 , ■ ■ • , Sm) is a list 
of m bidirectional substitution operations. 

Now we introduce weights to the substitution rules. With each substitu- 
tion Si, we associate a weight Wi, Wi > 0, so that it costs Wi to operate s^. Let 
W = {wi,W 2 , . • . , Wm) and Aw{pi,P 2 ) = the smallest total cost to transform pi 
into p 2 - When all the weights are 1, A simply counts the minimum number of 
operations required to transform one object into another. 

Definition 1. The average intra-group distance for a training group k is 



Given a specific training group and a specific distance function, p returns the 
average distance within a group of training objects. Note that for this formula 
to work, there must be at least two training objects. When n is equal to 1, we 
trivially define p = 0. We may drop the subscript or the input argument for p 
when it is clear from the context which subscript or argument is meant. 

Definition 2. The average inter-group distance between groups k and h is 



Here n' can equal 1. In essence, these two rather standard definitions capture 
the idea of the average distance of a distance table where distances are listed 
between pairs of objects. 

Definition 3. The stability quotient for three groups, 1, 2, and 3, is 




n i—1 



where € Qk and the size of Qk is n. 




where fj G Qh and the size of Qh is n! . 




'V\.2 W2,3 
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The learning algorithm or strategy is extremely simple. The stability quotient 
serves as the objective function of an optimization procedure so that we can 
simultaneously minimize the within group distances and maximize the between 
group distances. Obviously, we would like to configure the topology in such a 
way that, within a group, objects are close to each other, while at the same 
time they are far from objects of other groups. The goal is to try to keep all 
intra-group distances equal to zero and none of the inter-group distances equal 
to zero. 

Definition 4. The stability optimization is to minimize 

Z{Aw) 

subject to the constraint that X)r=i 

In other words, we want Z to get as close to zero as possible since negative costs 
are not allowed. 

Now, we are ready to consider the evolving nature of this system. We begin 
with the given P, an underlying set of structural objects; Sq, the initial given 
set of substitution operations; and Q, a generator. The generator systematically 
generates new substitution operations from previous substitution operations. 
These new substitutions are called macros. At each macros generation step t > 0, 

we have St = St-i U G(St-i) so that Sq C Si C S 2 , At each step t > 0, 

we associate with the current transformation system (P, St) its own stability 
optimization having its own Zt- There is a simple way to test whether or not 
the current Zt is satisfactory after arriving at a certain step t. We suspend the 
optimization loop at that step and try out the current optimum solution on 
the training set as well as on the test set. If the resulting classifications are 
satisfactory, we go to the on-line recognition stage. If not, we continue further 
until another stable step is suspected. Then we repeat the cycle of testing. In 
this way, convergence can, to some extend, be verified experimentally. 

Throughout the adaptation process, P and G are fixed. From an implemen- 
tational time-complexity point of view, we have, in effect, added another loop 
on top of the optimization loop (Figure 1). The optimization loop tries out dif- 
ferent cost vectors while the macro generation loop tries out larger and larger 
sets of macro substitution operations. Each stability optimization corresponds 
to a family of metric spaces. The number of times through the macros genera- 
tion loop corresponds to the number of families being examined by the agent. 
The stopping criterion has always been the same: loop until it finds a stable 
metric space. Given enough time, the system can always discover some stable 
classes even though they may not be the ones that the human designer had in 
mind. Alternatively, the human designer can preset a time limit, after which the 
training process will be halted and new training objects can be introduced into 
the training set and/or some old training objects can be taken out of it. In the 
ideal case, stable means Z = 0. Otherwise, stable means that the groups in the 
metric space form well-defined clusters/blobs to the satisfaction of the learning 
agent, judging by, for example, the mis-classification rate in the test run. Then, 
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the entire learning process stops and the agent is ready to go on-line to deal with 
any of the objects in the specified environment. 



Loop on set of substitution operations (t-loop). 

Optimization on weight vector until the metric space is stable. 
Distance calculation of two objects, Aw{pi,P 2 ) 

End loops 



Fig. 1. Control structure of the supervised inductive learning model 



After the training phase is over, one can prepare for the recognition stage. 
We store the shortest object from each training group as the representative of 
the group. Also, we store the most stable substitution cost vector W* at the end 
of the training process. When an unknown object is presented to the system, its 
distances to the representatives will be calculated using the weights W*. It can 
then be classified according to the nearest neighbor rule. 

3 Application 

Now we apply the model to the triple parity problem on bit strings. The idea 
of triple parity is a generalization of the usual parity problem. The usual parity 
problem is to classify strings as either even parity or odd parity. In the triple 
parity case, we first count the number of I’s in a bit string. Then we divide 
the count by 3 and get the remainder. If the remainder is 0, then the string 
belongs to group 1; if 1, then group 2; if 2, then group 3. The underlying set of 
structural objects P is the set of all finite strings of O’s and I’s. The initial set 
of substitution operations is 



'S'o — {0 0, 1 0} , 

where 9 is the null string; i.e., we have two rules: insertion (or deletion) of a 0 
and insertion (or deletion) of a 1. The generator is 

g{S) = {ab^e\aG { 0 , 1 }, b ^ 9 G S} . 

Consider the training set in Fig. 2. There are three training groups for three 
classes of objects. Each class is represented by only two or three relatively short 
examples. The size of P is infinite and the longest example from P is also infinite. 
We shall see how the learner proceeds step by step to discover the idea of triple 
parity. 

At step t = 0, the learner can find no satisfactory weight vector for classifi- 
cation. Given the three training groups and given the two insertion operations. 
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Group 1 


1 

2 


00 

10101 





Group 2 


1 

2 


01111 

1011001 





Group 3 


1 

2 

3 


11 

11011001 

001010111111 



Fig. 2. Training groups 



no matter what W we tried, Z is a relatively large value. There are too many 
mis-classifications in the training set itself. If we had specified a test set, there 
would be many mis-classifications in the test set as well. 

The system begins to evolve to the next step t = 1 by generating more oper- 
ations. Applying Q to Sq, we obtain 4 more substitution operations as follows: 



Si 



' O<^0' 

00 0 

' 01 ^ 0 ' 

10 0 

11 ^ 0 
V / 



Applying stability optimization under the transformation system (P, 5'i), we 
again find no good separation of classes. Even with six rules, no matter how we 
spread out the weights, no clear clusters are formed. The training objects still 
intermingle with one another in these metric spaces. 

However, when t = 2, we have 
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' 0 0 ' 
1^0 

00 0 
01 ^ 0 
10 0 
11 ^ 0 
000 4 ^ 0 
' 001 ^ 0 

010 4 ^ 0 

011 ^ 0 

100 4 ^ 0 

101 ^ 0 

110 4 ^ 0 

111 ^ 0 

V. y 
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Here, we achieve ideal stability Z = 0. Specifically, all average intra-group dis- 
tances are zero, i.e., pi = P 2 = Ps = 0; and all average inter-group distances 
are i.e., t;i _2 = t^i ,3 = '^ 2.3 = Fig- 3 shows the ideal weight vector as 
(0, i^,0, ^^ 0 , ^,0). Choosing the shortest object from 

each group to represent the group, we have 00 from group 1, 01111 from group 
2 and 11 from group 3 as representatives. The stable metric space contains 
three point-clusters because we have ideal stability. Each cluster is represented 
by its chosen prototype. The distance between any two clusters or prototypes is 
exactly 
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Fig. 3. Ideal weights 



When an unknown object from P such as 1010010111 is presented to the 
agent to be classified, we calculate its distance from each of the three proto- 
types using the ideal weight vector. If the distance between prototype i and the 
unknown is zero, then the unknown belongs to class i. In this way, all strings 
from the underlying infinite environment P are classified correctly in the on-line 
recognition stage even though only a small-sized training set is used and only 
one object is chosen from each class for comparison. 

4 Conclusion 

Concept generalization, induction, and indeed, learning itself, are seen here as 
the process of deriving a stable metric space to separate the training groups. 
A stable metric space is one containing well-separated, compact clusters. We 
have extended the model in [3] for supervised inductive learning in three classes. 
All bit strings were classified correctly in the on-line recognition. The agent has 
successfully learned to devise a way to classify bit strings according to 3-parity- 
ness, even though the agent was never told the concept of 3-parity-ness. 
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Appendix: Metric Definition 

Given a set S, a real-valued scalar function 5 of two arguments, 

(5:5'x5'^3fJ 

is called a metric function on S if for any x,y, z G S the following conditions are 
satisfied: 

non-negative 6{x,y) > 0, 
semi-reflexive x = y 5{x, y) = 0, 
symmetric S{x,y) = 6{y,x), 
triangle inequality 6{x,y) -\-S{y,z) > 6{x,z). 

The pair (S', S) is called a metric space or unifying metric space. 

Note that the semi-reflexive condition means that the distance between itself 
must be 0 and the distance between two different members from the set S could 
possibly be 0 as well. 

In order to see the relationship between this definition of the metric space 
and the usual definition, let us define standard metric space by the inclusion of 
the following additional axiom: 

definiteness x = y -f= S{x, y) = 0. 
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Now, induce an equivalent relation on the set S by saying that x ~ 1 / if 
5{x,y) = 0 and define a function 

with the set restricted to the set of equivalent classes S = so that ^(i, y) = 
S(x, y) where x G x (i.e., x is the equivalent class of x) and y G y. 

The pair (S,5) is a standard metric space. 

Proof by contradiction. Assume {S,6) is not a standard metric space. Then 
there exists x and y such that x ^ y and 5{x, y) = 0. But 

5{x,y) = 0 ^ 5{x,y) = 0 
^ X ~ y 

^ x,y G X and x,y G y 
^ x = y. 

In this way, we can always induce the standard metric space from a given 
(unifying) metric space. Hence, in this regard, we are justified in our usage of 
the term metric space. 
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Abstract. This paper presents an effective system for recognizing the 
identity of a living person on the basis of iris patterns that is one of the 
physiological and biological features with high reliability. To represent 
the iris pattern efficiently, a new method for optimizing the dimension 
of feature vectors using wavelet transform is proposed. In order to 
increase the recognition accuracy of competitive learning algorithm, an 
efficient initialization of the weight vectors and a new method to 
determine the winner are also proposed. With all of these novel 
mechanisms, the experimental results showed that the proposed system 
could be used for personal identification in an efficient and effective 
manner. 



1 Introduction 

Controlling the access to secure areas or information systems, reliable personal 
identification infrastructure is required. Conventional methods of recognizing the 
identity of person by using a password or cards are not altogether reliable. The reason 
is they can be forgotten or stolen. Biometric technology, which is based on biological 
and physiological features of human such as face, fingerprints, signature and eyes, has 
now been considered as an alternative to extant systems in a great deal of application 
domains such as the alternative of passwords or ID cards, user authentication of 
Internet electronic commerce, and entrance management for specific areas. 

Iris patterns have been focused for the last few decades in biometric technology in 
that human features should be stable and distinctive in order to be a good feature for 
personal identification. That is because every iris has a fine and unique pattern and 
does not change over time since 2 or 3 years after the birth, so it might be called as a 
kind of optical finger print[l][2]. 
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In this paper, we propose some optimized and robust methods for improving the 
performanee from the praetieal viewpoint. To aehieve the optimization of the 
proposed system, we eonduet the related works: the analysis of the popular feature 
extraetion methods - Gabor transform and Haar wavelet transform - to seleet a good 
method suitable for iris patterns, the optimization of the dimension of feature veetors 
by using the seleeted feature extraetion method, and the performanee improvement of 
a eompetitive learning neural network by revised novel meehanisms for the 
initialization of weight veetor and winner seleetion. The eontents of this paper are 
related works in ehapter 2, analysis and reeognition of iris image in ehapter 3, 
experimental results in ehapter 4, and finally the eonelusions in ehapter 5. 



2 Related Works 

Most of works on personal identifieation and verifieation by iris patterns have been 
done in the 1990s [4-8]. Some works have limited eapabilities in reeognizing the 
identity of person aeeurately and effieiently, so there is mueh room for improvement 
of some teehnologies affeeting performanee in a praetieal light. 

Reeent notieeable studies in personal identifieation based on the patterns and 
eolors of the iris are those of Daugman[4], Boles et a/.[6], and Wildes et a/.[8]. 

Daugman implemented the system with 2-D Gabor wavelet filter for loealization of 
iris, Gaussian transform for feature extraetion, and 256-byte iris eode for 
eomputation. The major eontribution of Daugman is to provide statistieal theories for 
degree of iris eode agreement. 

Boles et al. implemented the system operating the set of 1-D signals and obtaining 
the zero-erossing representations of these signals. The main idea of the Boles et al. is 
to represent the features of the iris by fine-to-eoarse approximations at different 
resolution levels based on the wavelet transform zero-erossing representation. The 
prototype also have the advantage of proeessing 1-D iris signatures rather than 2-D 
images used in both [4] and [8]. 

Wildes et al. iris-reeognition system eonsists of an image aequisition rig(low light 
video eamera, lens, framegrabber, diffuse polarized illuminator, and retiele for 
operator positioning) interfaee to a Sun SPARCstation20. Most of the Wildes et al. 
work is eoneentrated on the grabbing the images of iris and making routine 
proeedures of iris reeognition system effieient by applying Laplaeian pyramid and 
hierarehieal gradient-based image registration algorithm in pattern matehing. 

In this paper, the proposed iris reeognition system is optimized by the 2-D wavelet 
transform for feature extraetion, a novel initialization method and multi-dimensional 
winner seleetion for reeognition. The previous systems in the size of feature veetors 
use more than 256-byte iris eode for represent the feature sensible for iris 
pattems[4][8] but in this paper, only 87-bit size of feature veetor is exploited to 
reduee spaee and time. And the proposed system is evaluated under the 4000 
experimental iris data from 200 persons to show that 87 bit feature veetor is better 
enough to represent iris patterns effeetively and effieiently. 
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3 Analysis and Recognition of Iris Image 



The overall structure of the proposed system is illustrated in Fig. 1, and its 
processing flow is as follows. At first, an image surrounding human eye region is 
obtained at a distance from a CCD camera without any physical contact to the device. 
In the preprocessing phase, the following steps are taken. First, we detect eyelids and 
exclude them if they intrude, and eliminate the reflected light caused by the 
environmental illumination. Second, we should localize an iris, the portion of the 
image to be processed actually. Last, the normal coordination system of the image is 
converted into the polar coordination system so as to facilitate the feature extraction 
process. In the feature extraction phase, 2-D wavelet transform is used to extract a 
feature vector from the iris image. In the final phase, the identification and 
verification phase, a revised competitive learning method is exploited to classify the 
feature vectors and recognize the identity of person. In order lo improve the efficiency 
of the system, some improved methods are applied to the feature extraction phase and 
the identification phase. 







Human Eye 




CCD Camera 
55irirn Macro Lens 
lOOW Lamp 



Preprocessing 

Noise Elimination 
Iris Localization 
Polar Coordinate Transforn) 









Identification 

/Verification 

LVQ 



Feature Extraction 
2D Wavelet Transform 



Fig. 1. The proposed iris recognition system 



3.1 Preprocessing Phase 

We should first check whether eyelids intrude the image, and then exclude them if 
they are. The reflected light resulted from the environmental illumination is 
eliminated by blurring and enhancing the image with a threshold. (see Fig. 2) After 
eliminating noises, we determine an iris part of the image by localizing the portion of 
the image derived from inside the limbus (outer boundary) and outside the pupil 
(inner boundary). To localize an iris, it is required to find the center of the pupil and it 
is also used to convert the iris into polar coordination system. When the center of the 
pupil is found, we find the inner boundary and the outer boundary by extending the 
radius of a circle from the center of pupil and checking the intensity of the 
background. Fig. 3 shows the two boundaries for the image of Fig. 2. 
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Fig. 2. Blurred and enhanced iris image Fig. 3. Process of finding the inner and outer 

boundaries from the center of the pupil 



The localized iris part of the image is transformed into polar coordination system 
to extract features in an efficient way. The portion of the pupil is excluded from the 
conversion process because it has no biological characteristics at all. Fig. 4 shows the 
process of converting the orthogonal coordination system into the polar coordination 
system for the iris image. By increasing the angle ^by 0.8° for an arbitrary radius r, 
we obtain 450 values. We can get a 450x60 iris image for the plane {0, r) by 
repeating this process until the radius is increased to 60. 




Fig. 4. Representation of iris image by polar coordination system 



3.2 Feature Extraction Phase 

Gabor transform and wavelet transform are typically used for extracting feature 
vector from human iris pattems[4][9][10][l 1]. In this paper, wavelet transform is used 
to extract features vector from human iris image. Among the mother wavelets, we use 
Harr wavelet as a basis function. 

After finishing the preprocessing, we apply Haar wavelet transform to the image 
represented by the polar coordination system to obtain a feature vectors. For the iris 
image with the size of 450x60 obtained from the preprocessing, we apply wavelet 
transform four times in order to get a 28x 3 subimage with the same properties of the 
original iris image. This means we apply a multiple-level decomposition four times to 
the iris image signal. Finally, we organize the feature vector by combining 84 features 
of the highpass filter of the fourth transform(the least black box in Fig 5) and each 
representative value for the three other high pass filter areas(the three other black 





352 Shinyoung Lim et al. 



boxes in Fig 5). The dimension of the resulting feature vector is 87. Fig. 5 shows the 
conceptual process of obtaining the feature vector with the optimized dimension. 



28 




Fig. 5. Conceptual diagram for organizing a feature vector 



Each value of 87 dimensions has a real value between -1.0 and 1.0. To reduce 
space and computational time, we quantify each real value into one of two integer 
values by simply converting the positive value into 1 and the negative value into 0. 
Therefore, we can express iris image with only 87 bits. 



3.3 Identification and Verification Phase 

In general, the competitive learning neural network like LVQ has the faster 
learning mechanism than error back-propagation algorithm but its performance is 
easily affected by initial weight vectors[12][13]. 

To solve such a problem for at least iris patterns, a new method for initializing the 
initial weight vectors in an effective manner is proposed. This method generates the 
initial vectors that can be located around the boundary of each class. In the learning 
process, the usual learning process for LVQ is accomplished after initializing the 
weight vectors by the proposed method. In the recognition process, we set the 
acceptance level and use it to determine whether the final result is accepted or 
rejected[15][16]. 

The process of the proposed initialization algorithm, what we called the uniform 
distribution of initial weight vectors is as follows. 

Step 1 Set initial weight vectors with the vector of the first training data of each 
class and other weight vectors to be zero. 

for k = l,2,...M ( 1 ) 

, where 

: the vector of the first input data of the k - th class. 

: the first weight vector of the k - th class 
M : the number of class. 

Step 2 Select another data of each class as a new training data 

Step 3 Calculate the Eucledian distance dj between the training data and the weight 
vector by the following equation. 
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= t'(x; -wfy ( 2 ) 

, where 

k 

X- : the i - th component of the p - th learning data of k 

ip 

k 

W-; : the i - th component of the j - th weight vector of k 

y 

N : the dimension of a training data 

Step 4 Determine whether the class of the weight vector with the minimum distance 
among all dj is equal to the class of the training data. If the class of the 
weight vector is not equal to the class of the training data, then add the 
vector of the training data as a new weight vector. 

Step 5 Goto step 2 until all of the training data are used in the learning process. 

The winner selection method based on Euclidean distance that is generally used in 
competitive learning neural networks has no problem in determining the minimum 
distance of each class. However, if the dimension of feature vectors is increased, it 
has high possibility of selecting a wrong winner because of the failure of obtaining 
the information on each dimension. To solve such a problem, a new algorithm of 
winner selection called multidimensional winner selection method is proposed. The 
proposed algorithm is to determine the winner of each dimension, count the frequency 
of becoming the winner according to each class, and then select a class with the 
largest value as the final winner. 



- th class 

- th class 



4 Experimental Results 

To evaluate the performance of the proposed human iris recognition system, we 
collected 4000 data acquired from 200 people for 3 months with the help of 
volunteers of universities. All of them are Asians. The environment of image data 
grabbing is composed of a CCD camera with 55-mm positive meniscus lens, two 
60Watt halogen lamps, and about 400 infrared LED illuminators. The distances 
between the camera and target ranged from 32 cm to 20 cm. A half of data is used as 
the training data for LVQ and the remaining half as the test data. The parameters used 
in LVQ such as the learning rate and the iteration number are shown in Table 1. 
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Table 1. Parameters for LVQ 



Initial learning rate 


0.1 


Update of learning rate 


a(0 = a(0)(l- * ) 

total number of iteration 


Total iteration 


300 



Under the experimental environments above, the following subsections describe 
the results on each phase or the proposed methods. 

4.1 Feature Extraction Method 

Table 2 shows the recognition rate on two different feature extraction methods, 
Gabor transform and Haar wavelet transform, under the same classifier. The 
recognition rate on the training data is almost same each other, but in case of the test 
data, the recognition rate of wavelet transform is better than that of Gabor transform 
by 2.1%. Therefore, we used Harr wavelet transform as the basis feature extraction 
method in the following experiments. 



Table 2. Comparison of two feature extraction methods 





Gabor Transform 


Wavelet Transform 


Training data 


95.8 % 


96.2 % 


Test data 


92.3 % 


94.4 % 



4.2 Weight Vector Initialization Method 

Table 3 shows the results of the accuracy comparison of two initialization methods 
under the same experimental environments. In the case of the proposed method called 
the uniform distribution of initial weight vectors, the experimental results on both the 
training data and the test data showed better performance than those of the 
initialization with random values which is regarded as a basic initialization method. 



Table 3. Comparison of weight vector initialization methods 





Initialization with 
random values 


Proposed 

method 


Training data 


96.2 % 


96.8 % 


Test data 


94.4 % 


97.0 % 



4.3 Winner Selection Method 

Table 4 shows the experimental results on two winner selection methods when we 
use Haar wavelet transform for feature extraction and LVQ with the proposed 
initialization method. You can see that the proposed method, the multidimensional 
method showed a good result for human iris features. 
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Table 4. Comparison of winner selection methods 





Euclidian 
distance method 


Multi-dimensional 

method 


Training Data 


96.8 % 


98.1 % 


Test Data 


97.0 % 


97.6 % 



4.4 Size of Feature Vector 

From the three experimental results of 4.1, 4.2, and 4.3, we selected each method 
with high accuracy to configure a good system for personal identification based on 
iris patterns. The selected methods for each phase are as follows; Haar wavelet 
transform for feature extraction, uniform distribution method for initializing weight 
vectors, and multidimensional method for winner selection. 

With the iris recognition system composed of these methods, we try to minimize or 
optimize the dimension of feature vector without any influence to the recognition 
accuracy. We proposed a new feature extraction process. This method can efficiently 
represent a feature vector with 87 dimensions and it requires only one bit per 
dimension. Regardless of the successive transform of an image four times, we can 
separate an input space according to the degree of matching as shown in Fig.6. In the 
Fig. 6, the black ball points mean success of match and the white ball points mean 
failure of match. And x-axis means group of iris data and y-axis means degree of 
match. It is possible to verify the identification of a human by 65% of degree of match 
by 87 dimensions for a feature vector from his/her iris data calculating the Eucledian 
distance between input vector and reference vector. But if we run five times of 
transform of the image, we can not keep a threshold of recognition even though we 
might obtain much less size of feature vector as shown in Fig.7. Table 5 shows the 
performance evaluation according to the size of a feature vector. 

For efficient comparison with the proposed scheme for organizing feature vector, 
we used 256 dimensions(l byte per dimension) for each vector which is introduced 
in [4]. All of the experimental results on the proposed methods are summarized in 
Table 6. As the difference of size of feature vectors by 20 times compared with the 
256 dimensions(l byte per dimension), the improvement of performance in the 
process of recognition and verification is expected. 

Table 5. Performance evaluation according to the size of feature vectors 





256 dimension 
(1 hyte/dimension) 


87 dimension 
(1 bit/dimension) 


Training Data 


98.1 % 


98.1 % 


Test Data 


97.6 % 


97.7 % 
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Fig.6. Degree of match by 87 dimensions 
for a feature vector 
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Fig.7. Degree of match by 18 dimensions 
for a feature vector 



5 Conclusions 



In this paper, an effective method for personal identification and verification by 
means of human iris patterns is presented. To process the iris patterns in an efficient 



Table 6. Performance evaluation on the propsed methods 



Comparison 

Factors 


Feature 

Extraction 


Gabor 

transform 


Wavelet transform 


Recognition 


Initializaiton with 
random values 


Uniform distribution of initial weight 


Eucledian distance-based winner 
selection 


Multi-dimensional Winner 
Selection 


Size of 
Feature 
Vector 


93 dimension (4 bytes/dimension : 2976 bits) 


87 dimension 
(1 bit/dimension) 
87 bits 


Performance 


Success 
Ratio of 
Training 
Data 


95.8 % 


96.2 % 


96.8 % 


98.1 % 


98.1 % 


Success 
Ratio of 
Test Data 


92.3 % 


94.4 % 


97.0 % 


97.6 % 


97.7 % 



and effective way against existing methods, the following works are conducted; First, 
two methods - Gabor transform and Haar wavelet transform which are widely used 
for extracting features - were analyzed. From this analysis, Haar wavelet transform 
had better performance than that of Gabor transform. Second, Harr wavelet transform 
was used for optimizing the dimension of feature vectors in order to reduce 
processing time and space. With only 87 bits, we could represent an iris pattern 
without any influence to the system performance. Last, we improved the accuracy of a 
classifier, a competitive learning neural network, by proposing an initialization 
method of the weight vectors and a new winner selection method designed for iris 
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recognition. Thanks to these methods, we could increase the recognition performance 
to 97.7% for the test data. From the experimental results, we convinced that the 
proposed system is optimized enough to be applied to various applications. 
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1 Towards a Knowledge Web 

The World-Wide-Web has traditionally been viewed as a large hypertextual structure. 
However, recent developments in mark-up languages [16, 19], interoperability 
protocols [2] and web-based knowledge representation languages [8, 10, 12] have 
introduced new perspectives on the web. For instance, XML can be used to realise a 
database perspective on the web, supporting data integration across multiple 
applications. The work on RDF and RDFS provides the initial building blocks to go 
beyond simple integration based on structure towards semantic integration. Web- 
based knowledge representation languages such as OIL [8], Shoe [10] and XOL [12] 
move closer to this objective, providing advanced knowledge representation 
functionalities to support semantic interoperability and intelligent search. These 
approaches can be seen as informed by a distinct perspective on the web, which is 
often called the semantic web. The defining aspect of this perspective is the desire to 
move away from uni-dimensional hypertext, or simple data integration to achieve 
semantic integration between agents, based on shared conceptualizations. These 
shared conceptualization are normally called ontologies [9]. 

Of course, all of these perspectives are important in their own right and adopting 
one or another largely depends on the specific scenario in hand. For example, let’s 
consider a search scenario. If one takes a semantic web perspective, then search can 
be supported by intelligent engines able to disambiguate a query and home in on the 
‘right’ resource. This approach affords important advantages in terms of efficiency 
and precision. However, there may be cases in which such efficient behavior is 
undesirable. For instance, in those scenarios where the pedagogical gains afforded by 
browsing through a web of resources are more important than the efficiency of 
homing in quickly on the ‘right’ resource. 

In addition to three perspectives mentioned above, there is another perspective 
which can be imposed on the World-Wide-Web, which we call the knowledge 
web [7]. This perspective characterizes the web as the locus in which knowledge is 



S. A. Cerri and D. Dochev (Eds.): AIMSA 2000, LNAI 1904, pp. 358-361, 2000. 
© Springer- Verlag Berlin Heidelberg 




Enabling Knowledge Creation, Sharing and Reuse on the World-Wide-Weh 359 



created, shared and reused. A number of web-based technologies, developed at the 
Knowledge Media Institute of The Open University in UK, support these processes of 
knowledge creation, sharing and reuse over the web. These technologies include 
tools supporting document-centred discussion and debate [14, 18], tools for 
collaborative ontology development [4], modelling languages [13], high-level 
interfaces supporting semantic queries [14], publishing tools [5, 6] and online 
reasoning services [11]. These technologies have been used in over a dozen projects 
in domains such as guideline-centred healthcare [15], digital libraries [1], electronic 
publishing [5] and to support creation and sharing of best-practice repositories in 
industrial organizations. While of course each scenario presents distinct challenges, a 
number of research questions cut across all our work: 

• How can we use knowledge technologies to facilitate discussion and debate on the 
web, beyond what provided by structured discussion spaces? 

• What are the appropriate frameworks and modelling languages which allow reuse 
of knowledge components over the web? 

• How can knowledge and HCI technologies can be harnessed to allow non-experts 
to collaboratively develop domain models, which can then be exploited by 
reasoning agents? 

• What are the required characteristics, which make a domain suitable for an 
approach like ours, which uses ontologies to support collaborative model building 
by non-experts? 

Detailed (although necessarily partial) answers to these questions can be found in 
the given references to our work. In a nutshell, the web provides a powerful 
infrastructure, accessible from anywhere, where tools can be seamlessly integrated 
and different services can be delivered in a variety of media. Within this context, the 
integration of knowledge and HCI technologies appears to be a promising approach to 
support knowledge creation, sharing and reuse both within specific organizations and 
within generic communities of practice. However, subtle issues apply here and a 
holistic approach is needed in order to provide effective solutions, which takes into 
account the various user-related, organization-related and technology-related issues. 
Technologies have to be seamlessly integrated with existing work-practices and the 
knowledge sharing activities should ideally occur as by-products of normal work 
activities. Much research points out that “formality can be harmful” [17] and 
therefore it is important that modelling solutions only impose the minimal degree of 
structure needed to support the envisaged services. A related, crucial issue is that the 
underlying ontologies have to be shared and (ideally) ‘owned’ by all potential users, 
to ensure the feasibility of collaborative model construction. Finally, as it is well 
known from the knowledge management literature [3], no matter how sophisticated 
the technology, knowledge sharing and reuse can only take place if the appropriate 
organizational culture and motivation apply. There is more to knowledge sharing and 
reuse than any particular technology. 
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Continuations and Conversations 
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Networking forces us to adopt more and more a protocol-centric view of pro- 
gramming that is, to stop developing monolithic applications but, rather, glue 
components exchanging messages together. One of the key problem then is to 
design how components interact. Interaction design is not limited to the concep- 
tion of an API (Application Programmer’s Interface) stating the functionalities 
and the nature of exchanged data but includes how to make these components 
behave appropriately along all their lifetime. 

A conversation with a component is often viewed as a single-threaded pro- 
cess. A component is a server that keeps a state for each user and evolves that 
state as messages come in. The classical way to program servers is to adopt an 
event-driven style with a single event loop that dispatches messages according 
to the user’s state. Unfortunately, this model is inappropriate with regards to 
the current use of the web. 

The use of browsers to surf over Internet has developed new conversation 
styles. It is rather common to fill a form, to submit it and to analyze the ob- 
tained results. If these results are unsatisfactory then the user may come back 
to the previous form, adjust the answer and re-submit it. This is the “what if” 
style favored by the Back and Forward buttons offered by browsers. This use of 
backtrack does not fit well with the event-driven style as the server is required 
to detect the regression of the user and to fetch some past user’s state. 

But this situation is even more complex since browsers also allow to clone 
windows and, for example, to fill the same form in two different windows with 
different information and to submit them concurrently. Clearly, the server has 
now to process, concurrently and for the same user, two different independent 
requests. The event-driven style cannot cope simply with these new demands: 
such conversations are too difficult to sustain. 

Fortunately, there exists a powerful concept named continuation that per- 
fectly fits these new requirements. The sole programming language, widely 
taught as well as formally defined, that offers continuations is Scheme. At any 
given point in a program under evaluation, the continuation of an expression 
represents what to do next with the value of that expression. Thus a continua- 
tion naturally appears as a unary function that expects a value in order to be 
resumed. 

Continuations come from the work of Strachey and Wadsworth [7,6] and were 
first used to denote control features such as goto i.e., unconditional jump. Con- 
tinuations were shown to be the basic feature supporting all sequential control 
operators such as escapes, exceptions handling, coroutines [8] or engines [2]. 
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If a conversation is represented by a program iir a server, every iirteraction 
point (displayiirg a page to the user, asking a questioir, etc.) can be associated to 
a coirtiiruatioir; the user may resume any of these coirtinuations. A coirtinuatioir 
automatically packs everything ireeded to resume a coirversatioir (in implemen- 
tatioiral terms: registers and call stack). Continuations remove the need for a 
single, big, complex, event loop surrounded by a number of modules implement- 
ing transitions (pages, servlets, etc.) 

Continuations allow to concentrate a whole conversation in a single program 
written in the so-called direct style. A program transformation exists, named 
CPS (standing for Contiiruation-Passing Style), that converts a program using 
continuation operators into a somewhat bigger, more opaque program where 
these operators are elimiirated in favor of explicit coirtinuations reified as regular 
unary functions. It is possible to directly write programs in CPS style and this 
is indeed the case with the event-loop style. Direct style is of course easier to 
read, write and maintain [1]. 

Continuations simplify work-flow programming; they also offer the possibility 
to have richer conversation where one may jump back to some older question 
and answer it again or even to answer twice concurrently to the same question. 
These new possibilities open exciting new forms of conversations. 

The talk will present continuations [3, chapters] and their use within conver- 
sations. It will be based on two recent papers [4] (in French) presents an overview 
of an educational cdrom using continuations and [5] that analyzes some problems 
raised by continuations and browsers. 
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